Transactions | 
on INFORMATION THEORY x 


Volume IT-3 MARCH, 1957 4Aizy, - Number 1 
Wye.’ PERIODICAL 
r 4 


In This Issue 


Frontispiece | My 
Editorial page 3 
The IRE « Affiliate’ Plan page 4 
Contributions page 5 
Contributors page 81 


(7 5 For complete Table of Contents, see page 1. 
rane ? 


PUBLISHED BY THE | fo ve 


Professional Group on Information Theéry” 2 


IRE PROFESSIONAL GROUP ON INFORMATION THEORY 


The Professional Group on Information Theory is an organization, 
within the framework of the IRE, of members with principal profes- 
sional interest in Information Theory. All members of the IRE are eligible 
for membership in the Group and will receive all Group publications 
upon payment of the prescribed annual assessment of $2.00. 


ADMINISTRATIVE COMMITTEE 


Chairman Vice-Chairman 
M. J. DrToro Wiser B. Davenport, Jr. Sw DeutscH 
Polytechnic Research and De- Lincoln Laboratories Microwave Research Institute 
velopment Co., Inc. Brook- Massachusetts Institute of Technology Brooklyn 1, New York 
lyn 1, New York Cambridge 39, Massachusetts 


Secretary-Treasurer 


T. P. CHEATHAM 
Melpar, Inc. 
Boston, Mass. 


Harotp Davis 

Department of Engineering 
University of California 
Los Angeles 14, California 


Louis A. pE Rosa 

Federal Telecommunication 
Laboratories, Nutley, New 
Jersey 


Donat B. DuNcAN 
North American Aviation 
Downey, California 


R. M. Fano 

Research Laboratory of Electronics 
Massachusetts Institute of Technology 
Cambridge 39, Massachusetts 


Laurin G. FiscHER 
Federal Telecommunication Labs., 
Nutley, New Jersey 


W. D. WuitE 


M. J. E. Gotay 
Ridge Road and Auldwood Lane, 
Rumson, N. J 


Harotp R. Hottoway 
American Machine and 
Foundry Co., Greenwich, 
Connecticut 


Ernest R. KretzMer 
Bell Telephone Laboratories 
Murray Hill, New Jersey 


NatHAN MarcHAnp 
Marchand Electronic Laboratories 
Greenwich, Connecticut 


WinsLow PALMER 
Sperry Gyroscope Company 
Great Neck, New York 


F. L. H. M. Stumprers 
N. V. Philips Gloeilampefabreiken 
Research Labs., Eindhoven, Netherlands 


Airborne Instruments Laboratory, Inc. 
160 Old Country Road 
Mineola, New York 


IRE TRANSACTIONS® 


on Information Theory 


Published by the Institute of Radio Engineers, Inc., for the Professional 
Group on Information Theory at 1 East 79th Street, New York 21, N. Y. 
Responsibility for the contents rests upon the authors, and not upon the 
Institute, the Group or its members. Single copy prices: IRE-PGIT mem- 
bers, $2.20; IRE members, $3.30; nonmembers, $6.60. 


Copyright © 1957—Tue Instirute or Rapio ENcINEERS, INC. 
All rights, including translation, are reserved by the IRE. Requests for republication privi- 
leges should be addressed to the Institute of Radio Engineers, 1 E. 79th St., New York 21, N. Y 


IRE Transactions 
on 
Information ‘Theory 


Published Quarterly by the Professional Group on Information Theory 


Volume IT-3 March, 1957 Number 1 


Che, mies 


TABLE OF CONTENTS 


ee 


PAGE 
Frontispiece Philip M. Woodward — 2 
Entropy and Negentropy Philip M. Woodward ~— 3 
The IRE “Affiliate” Plan W. R. G. Baker 4 


Contributions 


On the Estimation in the Presence of Noise of the Impulse Response of a Random, 
Linear Filter George L. Turin 5 


The Output Signal-to-Noise Ratio of Correlation Detectors Paul EK. Green, Jr. 10 
Error Rates in Pulse Position Coding L. Lorne Campbell 18 


The Part of Statistical Considerations in the Separation of a Signal Masked by 
a Noise Jean A. Ville 24 


On a Cross-Correlation Property for Stationary Random Processes 
John L. Brown, Jr. 28 


A Systematic Approach to a Class of Problems in the Theory of Noise and Other 


Random Phenomena—Part I D. A. Darling and A. J. F. Siegert 32 
A Systematic Approach to a Class of Problems in the Theory of Noise and Other 

Random Phenomena—Part IT, Examples Arnold J. F. Siegert 38 
On the Capacity of a Noisy Continuous Channel Saburo Muroga 44 
Merit Criteria for Communication Systems A. Hauptschein and L. S. Schwartz _ 52 


First-Order Markov Process Representation of Binary Radar Data Sequences 
George C. Sponsler 56 


Automatic Bias Control for a Threshold Detector J. Dugundji and E. Ackerlind 65 


Exact Integral Equation Solutions and Synthesis for a Large Class of Optimum 
Time Variable Linear Filters Julius S. Bendat 71 


Contributors 81 


IRE TRANSACTIONS ON INFORMATION THEORY 


Philip M. Woodward 


Philip M. Woodward is a mathematician at the 
Radar Research Establishment, England. Born in 
Staffordshire, he was educated at Blundell’s School 
and Wadham College, Oxford, where he was a 
Methuen Scholar in mathematics. After gradua- 
tion in 1941, he joined “T.R.E.” (now R.R.E.) 
and during the war years worked on radio pro- 
pagation under Dr. Henry G. Booker. 

He has written various papers on antenna theory, 
noise theory, and computing. In the field of informa- 
tion theory (broadly defined), his work with I. L. 


Davies was an attempt to express radar problems 
in an absolute form, and this led to the publication 
in 1953 of a book entitled ‘‘Probability and Informa- 
tion Theory with Applications to Radar.” In 1956, 
he visited the United States as a Gordon Mackay 
Visiting Lecturer at Harvard University. His 
present interest lies in the development of auto- 
matic programming for electronic computers. 

Mr. Woodward is a Chartered Electrical Engi- 
neer, a member of the Ratio Club, and for recreation 
an amateur clockmaker and harpsichordist. 


Mar 


~*~ 


IRE TRANSACTIONS ON INFORMATION THEORY 


Entropy and Negentropy 


PHILIP M. WOODWARD 


Many of us are not physicists. We are radio 
engineers and applied mathematicians, but our work 
borders so closely on physics that we take a par- 
ticular interest in the reaction of the physicists to 
our special topic. It has been a rather unfortunate 
one. Certainly there is a relationship between the 
mathematical theory of communication and statist- 
ical thermodynamics, if only because both employ 
a familiar formula. 


> =e p: log p; 
to represent a certain fundamental quantity. This 
quantity springs naturally from what we might call 
“average logarithmic counting’’—of different pos- 
sible messages in communication theory and 
quantum states in physics. The coincidence, if such 
it can be called, was noted from the start both by 
Wiener and by Shannon, and it gave rise to a wide- 
spread feeling that the two subjects ought in some 
way to coalesce. But whereas X appears in both 
in chapter one, a purposeful divergence takes place 
in chapter 2 because the aims are quite different. 
Interaction between information theory and _ sta- 
tistical physics has thus been limited to little more 
than an interchange of terms. Yet even in this there 
have been some widespread misconceptions which 
we ought to try finally to clear up. 

First, what are the facts? Communication theorists 
have borrowed the term entropy from the physicists 
to describe the quantity X, and following Shannon 
they use the letter H for it. Physicists use the letter 
S, and their units are different, so we have 

Shannon’s H = X (dimensionless entropy) 
Physicists’ S = 
But the letter H also figures in Boltzmann’s H- 
Theorem where, in spite of a remark to the con- 
trary in Shannon’s original paper, we find that 


kX (physical entropy). 


Boltzmann’s H = —X (negative dimensionless 
entropy). 


Although physicists have often felt some uneasiness 
about the sign of H in information theory, this 
particular clash of notation does not appear to have 
been the cause of it. Rather it was the use of a 
single quanity to measure simultaneously uncer- 
tainty and information which originally gave rise 
to difficulty. An explanation was given in 1952 by 
D. K. C. MacDonald writing in the Journal of 
Applied Physics, and the last word—so we thought— 
had been said. But recently an eminent physicist, 
Leon Brillouin, has remarked that Shannon’s H 
is negative entropy because it is reduced and not in- 
creased by the action of an irreversible filter. It seems 
to have been a hasty conclusion, and the probable 
explanation of the puzzle is this. In the communica- 
tion problem, we are interested only in the electrical 
form of the signals and not in the heat dissipated 
by them when they flow through resistors in filters. 
The physicist must include this disorganized radia- 
tion, and his total entropy will then be found to 
increase without his having to change any defini- 
tions. 

Whatever the true explanation may be, it should 
be made clear without any doubt or hesitation that 
the term entropy is used in information theory by 
mathematical analogy. The expression X, which we 
call the entropy of a set of probabilities, is identical 
in form and in sign with the expression for entropy 
in physics. The units may differ, and the prob- 
abilities stand for different things, but neither of 
these can change the sign of X. The pure information 
theorist, of course, is left unmoved by arguments 
about mere names. In a mathematical theory, 
definitions are judged solely in the light of the 
theorems to which they lead, and word-names are 
arbitrary, even irrelevant. But when we apply the 
theory, or compare two different theories, it is 
surely important to avoid using one name for two 
different things or two opposite names for one and 
and the same thing. 


ivy) 


IRE TRANSACTIONS ON INFORMATION THEORY 


The IRE “Affiliate” Plan—A New Venture in 


Engineering Society Structure and Service 


W. R. G. BAKER, Chairman 


IRE Professional Groups Committee 


On January 4, 1957, the IRE Board of Directors 
arrived at a decision which may in time prove to 
be one of the most far-reaching in its 45-year his- 
tory. On that date the Board adopted a plan which 
will enable non-[RE members whose main profes- 
sional interests lie outside the sphere of IRE ac- 
tivities to become affiliated with certain of the IRE 
Professional Groups without first having to join the 
IRE itself. 

This plan is aimed at those specialists in other 
fields of science and technology whose work touches 
upon our own electronics and communications 
field only in specialized areas. In effect, the IRE 
is extending the specialized services of its Profes- 
sional Groups to every field of science and en- 
gineering. 

An outstanding example of where these services 
are needed may be found in the case of the medical 
and biological sciences. At the present time some 
1400 IRE members enjoy the privileges of mem- 
bership in the Professional Group on Medical 
Electronics. And yet there are hundreds, perhaps 
thousands, of medical doctors, biologists, and 
others to whom the activities of this Group would 
be of interest and value. Both they and the Group 
would benefit from their participation. To require 
these persons, who have no interest in radio en- 
gineering, to join the IRE in order to join the 
Group is unreasonable, and probably futile as well. 
In fact, it was largely to provide an answer to this 
particular problem that the Affiliate Plan was 
first conceived, although it pertains to other fields 
as well, such as Computers, etc. 

The Affiliate Plan is admittedly an experiment. 
So far as is known, no other society has ever tried 
a similar scheme. The Board of Directors feels 
strongly that the benefits afforded by the plan 
justify the risk that some persons who should 
join the IRE will instead become Affiliates. To 
minimize this risk, the plan has been carefully 
worked out along the following lines: 


1) Participation in the plan is at the option of 
each Professional Group. It is not expected that 
all Groups will adopt it; only those which feel it 
serves a need in their particular field. 


2) Each Group interested in initiating the 
Affiliate Plan must submit to the Chairman of the 
Professional Groups Committee a list of accredited 
organizations which has been selected and ap- 
proved by its Administrative Committee, for 
official approval by the IRE Executive Committee. 


3) To be an Affiliate of a Professional Group, a 
person must belong to an accredited organization 
approved by that Group and the IRE Executive 
Committee. Moreover, he shall not have been an 
IRE member during the five years prior to his 
application. He may affiliate with more than one 
Group, provided the accredited organization to 
which he belongs is recognized by the Groups con- 
cerned. 


4) The fee for Affiliates shall be the assessment 
fee of the Group, plus $4.50. The latter covers IRE 
subsidies to the Group, Professional Group over- 
head expenses borne by IRE Headquarters, and 
50 cents which is to be rebated to IRE Sections 
for mailing and meeting costs. 


5) An Affiliate will be entitled to receive the 
Transactions of his Group and that part of the 
IRE National Convention Record pertaining to 
his Group. He will be eligible for a Group award, 
and may attend local or national meetings of the 
Group by payment of charges assessed Group 
members. 


6) An Affihate cannot serve in an elective office 
in the Group or Group Chapter, nor vote for candi- 
dates for these offices. 


7) An Affihate may hold an appointive office in 
the Group or Group Chapter. 


8) An Affilate may not receive any IRE bene- 
fits that are derived through IRE membership. 

The Affiliate Plan is a bold and farsighted ven- 
ture; one that recognizes and provides for the 
rapidly spreading influence of electronics in every 
walk of scientific and technological life, and one 
that enables the IRE to further its aims as a pro- 
fessional engineering society—the advancement of 
radio engineering and related fields of engineering 
and science. 


March 


be 


i? 


& 


IRE TRANSACTIONS ON INFORMATION THEORY 


~ On the Estimation in the Presence of Noise of the 
Impulse Response of a Random, Linear Filter’ 


GEORGE L. TURINT 


pummary—A sounding signal is transmitted through a trans- 
ssion medium which may be characterized as a linear filter 
ose impulse response is random, i.e., drawn according to some 
bability law from an ensemble of possible impulse responses. 
the output of the medium is added random noise; the resultant 
veform is the received signal. A receiver is required to operate 
this received signal so as to make a linear, minimum-mean- 
1are-error estimate of the impulse response of the transmission 
‘dium. 

Two problems concerned with the design of such a sounding 
stem are considered in this paper. The first is the determination 
the transfer function of the optimum linear estimating filter in 
» receiver. The second is the optimization of the spectrum of the 
nsmitted sounding signal. 


INTRODUCTION 
| hee IS OFTEN faced with the problem of making 


measurements on a transmission medium whose 

characteristics are random, or at least unknown. 
us problem might arise, for example, in the case of 
mmunication through a random-multipath medium 
ch as the ionosphere.’ * In such a case, it usually is 
vantageous from the point of view of system per- 
rmance to obtain by measurement a more exact knowl- 
ve of the strengths and delays of the various paths 
an that which is available a priorz. Measurement 
yuld in this case be a means to the end of more reliable 
mmunication. On the other hand, measurement may 
-an end in itself, as in the radar case. Here, the trans- 
ission medium consists of a group of targets (and the 
ace between the radar set and the targets), and one 
interested in measuring the delays of signals reflected 
om these targets. 
In general, one will not be able to determine the 
aracteristics of the medium exactly: one must make 
y measurement in the presence of noise, and, barring 
| infinite measurement time, this precludes an un- 
uivocal measurement. In other words, one must be 
ntent with estimates of the pertinent characteristics. 


* Manuscript received by the PGIT, July 5, 1956. The research 
sorted in this document was supported jointly by the U.S. Army, 
wy, and Air Force under contract with the Mass. Inst. Tech. It 
abstracted from a doctoral thesis submitted to the Dept. of Elec. 
., M.I.T., May, 1956. ' 

+ Hughes Aircraft Co., Culver City, Calif. Formerly with 
spt. of Elec. Eng. and Lincoln Lab., M.1.T., Cambridge, Mass. 
1G. L. Turin, ‘Communication Through Noisy, Random- 
ultipath Channels,’ Sc. D. thesis, M.I.T., Cambridge, Mass. ; 
56. Also published as Tech. Rep. 116, Lincoln Lab., M.I.T.; 
ay 14, 1956. q 
2G. L. Turin, “Communication through noisy, random-multi- 
th channels,” 1956 IRE Convention Recorp, Part IV, pp. 
4-166. 

3R. Price, “The detection of signals perturbed by scatter and 
ise’ IRE Trans., PGIT-4, pp. 163-170; September, 1954. 

4W. L. Root and T. S. Pitcher, “Some remarks on statistical 
tection,” IRE Trans., vol. IT-1, pp. 33-38; December, 1956. 


The choice of the “pertinent’’ characteristics and the 
type of estimation to be used depends on the requirements 
of the particular problem, as well as on mathematical 
tractability. In this paper, we shall assume (as we may, 
for instance, in the two examples cited above) that the 
transmission medium is linear, so that it may be com- 
pletely characterized by an impulse response function. We 
shall consider the special problem of making a minimum- 
mean-square-error estimate of this impulse response by 
means of a linear estimating filter. 

We shall, in particular, consider the measurement 
system of Fig. 1. A sounding signal, x(¢), of some finite 


y(t) g(t) 


NOISE 
MEDIUM n(t) FILTER 
N(f) 


| | 
| | 
| TRANSMISSION | ESTIMATING 
| | 
| | 
| | 


be——— CHANNEL ————>} 


Fig. 1—The system under consideration. 


duration, 7’, is assumed to be transmitted. This signal 
passes through the transmission medium, represented in 
Fig. 1 as a linear filter. In general, the impulse response 
of this filter may be a function of time. We shall assume 
here, however, that if the impulse response does vary 
with time, it varies slowly enough so that we may consider 
it essentially fixed during the transmission of the sounding 
signal. We may then, for our purposes, write the impulse 
response of the medium as a function of a single variable: 
h,,(r). h,,(7) is considered to be random, at least from the 
viewpoint of the transmitter and receiver; that is, it may 
be thought of as having been drawn, according to some 
probability law, from an ensemble of possible impulse 
responses. ; 

After transmission through the medium, the signal is 
perturbed by the addition of a random noise waveform, 
n(t), which we assume is statistically independent of 
h,,(r), and is statistically stationary. The noise-perturbed 
signal, y(¢), passes into a receiver. This consists of a linear 
filter with impulse response, h,(7), whose output, g(¢), 1s 
an estimate (as a function of time) of the impulse response 
of the transmission medium. It may be noted that the 
receiver may be located physically at a point distant from 
the transmitter, as in the communication case, or at the 
same point as the transmitter, as in the radar case. 

We shall consider two design problems connected with 


6 IRE TRANSACTIONS ON 


the system of Fig. 1. The first is the determination of the 
impulse response, h,,,,(7), of the estimating filter which 
makes the mean-square difference between g(f) and 
h,,(t) a minimum for a given x(t). Then we shall investigate 
the problem of adjusting z(¢), holding h.(7) hes?) 
so as to bring the mean-square error in the estimate down 
to an absolute minimum. 

Throughout this paper we shall use Fourier transforms 
in which the frequency-domain variable is a cyclic, rather 
than a radian, frequency. That is, we shall use the trans- 
form pair 


I 


a) =f 2he" af (1) 


foo) 


Z(f) i elder dt (2) 


where 2(f) is the time-domain function, and Z(f) its 
frequency-domain mate. 


THe Optimum ESTIMATING FILTER 


Suppose we know that the impulse response of the 
medium lies essentially inside of some interval, say 
0 < r+ < A. Then the expression for the mean-square 
error of the output of the estimating filter may be defined 
as 


A 
e= Bval + | [g(t) — h,,(t)]?? ar] (3) 
A Jo 
where Ey, denotes a statistical average over the en- 
sembles of possible noises and possible impulse responses 
of the medium. We minimize this error by varying the 
estimating-filter impulse-response; that is, we set 


de = 0 (4) 


where the variation (‘‘6’’) is with respect to the estimating- 
filter impulse-response. 
It is shown in Appendix I that the transfer function of 


the estimating filter (7.e., the Fourier transform of h,(7)) 
which satisfies (4) is 


1 X*(f) 


AEN Gi xu 
EG) Fae 


Jel we (5) 


where the asterisk denotes ‘‘complex conjugate.’”’ In this 
expression, X(f) is the sounding-signal voltage-density 
spectrum; | H,,(f) |’, the average power-transmission 
function of the medium; and N(f), the noise power- 
density spectrum. Thus, the only statistic of the medium 
which one must have a priori is | H,,(f) |’; if we have 
no a priort knowledge of the medium, we set this equal 
to a constant. | H,,(f) |? may of course be a statistic 
derived from information gained from previous measure- 
ments. 

The filter of (5) may not be physically realizable, 


INFORMATION THEORY Marel 


because the condition that the impulse response of 2 
realizable filter must be zero for negative arguments is not 
used in the derivation in Appendix I. However, this is not 
a great problem; we can usually accept some delay ir 
obtaining our estimate of h,,(7), and H,,,,(f) can usually 
be made realizable, at least to a very good approximation 
by introducing sufficient delay. This delay will usually 
not be more than the order of 7’ seconds. 

For the no-noise case, z.e., N(f) 
reduces to the inverse filter 


0, the solutior 


1 


bo hee = ne 


(6) 
This is, of course, to be expected, for in this case the 
voltage-density spectrum at the receiver input i 
H,,(f)X(f), and the filter of (6) restores this to just 
H,,(f), which is the Fourier transform of h,,(7). Thus 
in the no-noise case, h,,(7) is estimated without error. — 

For N(f) # 0, comparison of (5) and (6) shows tha 
the optimum filter may be expressed (see Fig. 2) as the 


Fig. 2—Analysis of He p:(f) into component filters. 


cascade of the inverse filter of (6) and a filter with transfe 
function 


| X(f) |? 

A 
H.(f) a Neehe (7 
wi) + ADE 
where we have set 

N(f) 

NO 

. PEt Gals 


(N,(f) is thus the noise power-density spectrum, referrec 
back to the transmitter through the average medium. 
The output of the inverse filter in Fig. 2 contains h,,(r) 
but it also contains a large amount of noise, especially a 
those frequencies where X(f) is small. The second filte 
attenuates the noise, but in doing so, smears h,,(r). Th 
optimization procedure may be thought of as one whicl 
makes the best compromise between eliminating nois 
and keeping h,,(r) undistorted. H,(f) is a zero-phas 
filter, as one would expect, for any other phase (excep 
a linear one, which is a trivial exception) would distor 
the desired output, h,.(7), without helping to attenuat 


te effect of the noise, which has random phase anyway. 
If A is of the order of magnitude of 7, then the second 
rm in the denominator of (5) is roughly the transmitted 
»wer-density spectrum. If this is small, for all f, com- 
red to N,(f) (or, equivalently, if the average received- 
mal power-density spectrum is small compared to 
(f)), then (5) becomes 


1 X*(f) 
Aa) A Nf) 
N,(f) is constant with frequency, which will occur, for 
lample, if the noise is white and we have no a priori 
1owledge of the medium, then the filter of equation 
) is “matched’”” to the sounding signal; that is, H.,,,(f) 
proportional to the complex conjugate of X(f), or, 
juivalently, h.,,,(7) is proportional to x(—r), the 
Hiding signal reversed in time. 
| As has been suggested in the Introduction, the above 
ults may have application to multipath-communication 
d radar problems. In both of these cases, the impulse 
sponse of the medium may be represented to a good 
»proximation as a sequence of impulses (Dirac delta 
nections), as in Fig. 3(a). This figure is drawn for a four- 
uth (four-target) medium. The signal component, g,(t), 
the estimate (or, more practically, the envelope of the 
pnal component) will look something like Fig. 3(b); a 
pise component will generally be superimposed on g,(t). 
‘he nonzero widths of the pulses in Fig. 3(b) result from 
e fact that the bandwidths of the sounding signal and 
the estimating filter are usually finite; in fact, the pulse 
idths will generally be of the order of the reciprocal of 
ie sounding-signal bandwidth.] The strengths, a;, and 
ie delays, 7;, of the various paths (targets) may be 
timated from the g(t) waveform (although only the latter 
wrameters are of interest in the radar case). Further 
ine) delay information may be obtained from an exami- 
tion of the phase of g(t) at the envelope peaks. 
In the multipath or radar context, in the one-path or 
arget case, the filter derived in this paper bears a relation- 
iip to other filters derived by Van Vleck and Middleton* 
hd Dwork.°® These latter filters may be thought of as 
nsed on a criterion which requires that at some time 
prresponding to the path (target) delay, the filter output 
as accurate an estimate as possible of the path (target) 


irther that the filter output be simultaneously as close 
, zero as possible at all other times. Eq. (8), for the one- 
uth case [V.(f) = N(f)], indicates that the two criteria 
ad to the same filter only in the limit of small signal-to- 
pise ratio. 


(8) 


/5 J. H. Van Vleck and D. Middleton, ‘“‘A theoretical comparison 
the visual, aural, and meter reception of pulsed signals in the 
esence of noise,” J. Appl. Phys., vol. 17, pp. 940-971; November, 


°B. M. Dwork, ‘‘Detection of a pulse superimposed on fluctua- 
on noise,”’ Proc. IRE, vol. 38, pp. 771-774; July, 1950. 


ARK, 


Turin: Impulse Response of a Random, Linear Filter 7 


Up is 4 


g(t) 


>t 
(b) 


Fig. 3—(a) A typical multipath (multi-target) impulse response. 
(b) The signal component of an optimum estimate of (a). 


THE OPTIMUM SOUNDING-SIGNAL SPECTRUM 
So far we have considered X(f) to be arbitrary. Now 
let us, while keeping H.(f) = H..,,,(f), solve for the 
X(f) which minimizes ¢, subject to the constraints that 
the energy in x(¢) be fixed.and that X(f) lie within a given 
band. That is, let us set 


oe (9 


and 
iw 


where F’, is the permitted band of frequencies, and solve 
the equation 


for f not in F, (10) 


d(€m + AK) = 0 (11) 


where is some constant.’ ¢, is the minimum mean- 
square error for an arbitrary X(f). 
The solution to (11) is shown in Appendix II to be 


aril) are! 


1/4 fieclhin — ENON a j 
[N(f)] hoses a y ns (12) 
0 pney ee 


7F. B. Hildebrand, ‘Methods of Applied Mathematics,” Pren- 
tice- Hall, Inc., New York, N.Y.; 1952. See sec. 2.6. 


8 IRE TRANSACTIONS ON INFORMATION THEORY 


In this equation, 6(f) is an arbitrary phase function, /, 
is the set of all frequencies in F, for which 


1 a V N(f) 


and 3 is the set of all frequencies not in /,. The constant; 
X, is adjusted to satisfy the energy constraint, (9). The 
interpretation of (12) confirms one’s intuitive notions. 
The first factor, 


ING”, 


indicates that if the noise power at some frequency, /., 
is very small, then little signal energy is needed at that 
frequency to determine H,,(f;). On the other hand, the 
second factor indicates that if the noise power at f, is 
very large, or the average power transmission of the 
medium is very small, then it is a waste of the limited 
available energy to put much, if any, energy at f,; the 
energy may be used to more advantage elsewhere. 

If we place (12) in (5), we obtain the optimum esti- 


mating-filter transfer-function corresponding to the 
optimum X (f): 
H(f) = Vso? 
1 WV N(f) is 
p\y-/4 oS pe jf bar [2 
wear a nee (13) 


0 if inelies 


We have assumed here that N(f) is nonzero at all fre- 
quencies. The optimum filter of (13), as we should expect, 
has large gain at frequencies with small noise, and small, 
or zero, gain at frequencies with large noise. 

The mean-square error corresponding to (12) and (13) is, 
from Appendix IT: 


ORGANS 1 1 77 /r 12 
én = lf VNOa +5) TOP a}. (14) 


The first term in this expression is the contribution to 
the error of the noise and smear components of the 
estimate which arise within the pass band of H,,,,(f). 
The second is the smear contribution arising from the 
complete lack of an estimate of H,,(f) in the stopband 
OLE eh) 

We may actually be interested only in estimating 
H,,(f) within the transmission band, F,, instead of for 
all f as we have done. That is, we may be interested in 
estimating the instantaneous impulse response of an 
equivalent medium which has zero transmission outside 
of F,. This imposes no additional problem, however; it is 
easy to see that the result of (13) is optimum in this case 
also, since it is independent of values of H,,,(f) outside 
of the transmission band. The error of (14) also obtains 
in this case, except that the second integral is now taken 
over only those frequencies in 3 which are within the 
transmission band (7.e., the intersection of F, and F%). 


Mare) 


We note as a final point that if the noise is white, 7.e. 
N(f) is constant, at least over the transmission band, thi 
optimum estimator of (13) is proportional to the comple: 
conjugate of the sounding-signal spectrum of (12). Thai 
is, in the white-noise case the optimum estimator i 
matched to the optimum sounding signal, regardless o 
the form of the average power-transmission spectrum 0 


the medium, | H,,(f) |’. 


APPENDIx [ 


Derivation oF H,,,,(f) 


We desire the solution of (4): 


i= aBy.xf [fo — ror ae] =o. OB 


Expanding this, we obtain 


el g(t), 5 g(t) a] = Pal hn (t) dg(t) u| (16) 


since 6h,(t) = 0. Now g(é), as indicated in Fig. 1, is the 
output of the estimating filter, whose unit-impulse 
response is h.(7). Since y(¢) is the filter input, 
git) =f hdnylt = 9 ar. (17) 
y(t) is, in turn, the sum of the noise, n(¢), and the output 
of the transmission medium, whose unit-impulse response, 
h,,(7), is to be estimated. That is 
yt) = [ halsa(t = 2) dr + 00 (18) 
where x(¢) is the channel input (sounding signal). Com- 
bining (17) and (18): 


foo} 


g(t) = I [ hh(o)alt ~ + ~ 0) de dr a 


4 ls OG aD Ps 


The variation of g(t) is thus 


© 


dg(t) = | hm(a)a(t — tr — o) bh,(7) do dr 
Ee (20) 


+ i n(t — 7) bh.(7) dr. 
Remembering that n(t) and h,,(7) are independent, and 


assuming that Hy[n(/)] = 0, we obtain for the right-hand 
side of (16) 


zB, f I h,(t)h»(a)a(t — + — o) 6h,(r) do dr at}. (21) 


+ 
_ 


pimilarly, the left-hand side of (16) is 


4] / fh iif iinlGa ym CCA CAD me ears 


-a(t — rt — a) 6h,(1) do’ dr’ da dr ar] 


+ Ba) a I h.(1')o,(7 — 7’) b6h,(7) dr’ ar]. (22) 


n deriving (22) we have assumed statistically-stationary 
Loise with autocorrelation function 


ox(t) = Ey[n()n(t + 7)]. (23) 


Since we have assumed that A is essentially greater 
;han the duration of h,,(7), the limits on the first integral 
sign in (21) may be extended to (—©, +) without 
shanging the value of the integral. The same statement 
nay be made approximately about the first term in (22); 
for this term is the integral of the product of the ‘signal’ 
zomponent® of the estimate of h,,(r) and the variation of 
chis component, and one would not expect the signal 
component to last appreciably longer than h,,,(r) itself.” 

_ Thus extending the limits, and equating (21) and (22), 
ve get 


/ ha) ar By) fil h(t )hnla hin(a)a(t — 2” — of) 


ee eis! dz! dxdt af hr \by(r — 1") de’ 


Z : / POO ae ar| =10lin(24) 


_-We shall neglect the physical realizability condition, 
(7) = Ofor 7 < 0. Then 6h,(r) is arbitrary for all 7, and 
in order for (24) to be satisfied, the factor which multiplies 
Hh.(7) must be zero for all 7. Setting it equal to zero, 
Fourier transforming the resulting equation, and averag- 
ing over the ensemble of all possible transmission-medium 
mpulse responses, we obtain 


| Petey eX) 4 AH CONG) 
=e VEEN eG SM, 


= 


(25) 


| 
| 
| 


Eq. (5) follows from this immediately. In order to show 
hat this solution yields a minimum, rather than a maxi- 


87, e., the first term in (19). 

9The approximation here is very good if the reciprocal of the 
bandwidth of «(¢) is small compared to the duration of h»(r). 
This condition is also necessary if the fine detail in h,,(r) is to be 
hnighly resolved by the sounding system.] 


Turin: Impulse Response of a Random, Linear Filter 9 


mum or inflectional, error, one merely finds the second 
variation of ¢, and shows that this is positive for H,(f) = 


Te): 


AprEenpix II 


DERIVATION OF X,,.(f) 


To derive (12) for the optimum spectrum of x(¢), we 
start with (3) for the mean-square error. Using (19) in 
this, and remembering that the average of the bracketed 
expression in (24) is identically zero, we obtain for the 
minimum mean-square error 


hake an, [ na(d) dt 
= if hein a)... 7) tlc 7 ee endl. ar}. (26) 


€, 18 also expressible in terms of frequency-domain 
functions; using Parseval’s theorem in (26) and averaging: 


= 4| [TOE ar 


=f He) Ha) 


XG) ar}, (27) 


Using (5) for H,,,,(f), (27) becomes 


Ges | 
NPAs eae Ge, df (28) 


€opt 


w= if TROP! 


where 


N(f) 


LD = ee 
= FOr 


We now constrain the energy in the transmitted waveform 
to be constant: 


[ (xn Par = x. (29) 


In order to find the optimum X(f), we must solve the 
variational problem’ 


5(€m + AK) = 0 (30) 


where \ is some constant. Using (28) and (29), (80) 
becomes 


ae 


If we constrain | X(f) |? to be zero outside a certain band, 
F, [cf. (10)], then 6 | X(f) |? is also zero there, and (31) is 
satisfied for those frequencies. [or frequencies within the 
band, on the other hand, we must try to set the bracketed 


NCA) | H(A) | 
LNG Gale 


| 6 | X(f) df = 0. 
(31) 


10 IRE TRANSACTIONS ON 


term in the integrand equal to zero.’° This leads to the 
equation 


\ | X(f) |* + 2AAN,(f) | X(f) 
+ N.C) Se, Oa l= 0: 


2 


(32) 


If (32) and (29) can be simultaneously satisfied within 
the band F’, by a non-negative function | X(f) |”, then the 
solution is complete. If, however, there are frequencies 


One might be tempted to obtain another solution, X(f) = 0, 
by writing 6 | X(f)|? = 2 | X(f)| 6|X(f)|. This is a spurious solution, 
however, for the problem is actually phrased completely in terms of 
| X(f)|? [ef. (28) and (29)]. The second solution would disappear 
if we replaced | X(f)|? by, say, S(/). 


INFORMATION THEORY March 


at which | X(f) |? would be negative, then it may be 
shown! that the correct solution is to satisfy (32) and (29) 
simultaneously at all frequencies for which | X(f) |? 
turns out non-negative, while at the same time setting 
| X(f) |? equal to zero at all other frequencies; this is the 
solution indicated in (12). 

Finally, (14) follows directly on substitution of (12) 
into (28). 


ACKNOWLEDGMENT 


The author would like to extend his thanks to Prof. 
R. M. Fano and Dr. W. B. Davenport, Jr. of M.I.T. for 
their help and guidance in this research. 


The Output Signal-to-Noise Ratio 


of Correlation Detectors’ 


PAUL E. GREEN, JR.f 


Summary—Expressions are derived for the output signal-to- 
noise ratio of a correlation detector when the two input functions to 
be correlated differ only by the presence of an arbitrary linear filter 
in each path, and the addition of noise to each. It is assumed that 
the signal and noises are Gaussian with arbitrary power density 
spectra, and the integration is performed by a filter of arbitrary 
transfer function. Two types of correlation detectors are distin- 
guished, the low-pass detector in which the integrator is a low-pass 
filter, and the band-pass detector in which one of the two input 
functions is deliberately displaced in frequency by A and the 
integrator is therefore a band-pass filter tuned to A. Output signal- 
to-noise ratio expressions for the two types are almost identical. 


INTRODUCTION 


HE USE of cross correlation for the detection of 
lp signals in noise has been treated from several 
points of view. Lee, Cheatham, and Wiesner’ 
discussed the use of a sampling correlator for this function, 
and Fano’ and Davenport*® extended the treatment to 
the type of correlator which compared continuous wave- 


* Manuscript received by the PGIT, January 17, 1957. The 
research in this document was supported jointly by the U. S. Army, 
Navy, and Air Force under contract with Mass. Inst. of Tech. 

+ Lincoln Lab., M.I.T., Lexington, Mass. 

1Y. W. Lee, T. P. Cheatham, and J. B. Wiesner, “The Appli- 
cation of Correlation Functions in the Detection of Small Signals 
in Noise,’ Tech. Rep. No. 141, Res. Lab. of Electronics, M.1.T.; 
October 13, 1949. 

2R. M. Fano, “Signal to Noise Ratio in Correlation Detectors,” 
Tech. Rep. No. 186, Res. Lab. of Electronics, M.I.T.; February 
19, 1951. 

3 W. B. Davenport, Jr., ‘‘Correlator Errors Due to Finite Obser- 
vation Intervals,’ Tech. Rep. No. 191, Res. Lab. of Electronics, 
M.1.T.; March 8, 1951. 


forms rather than sample values of these waveforms.* 
The present paper treats the continuous correlator in a 
more general fashion than was done in the papers of 
Fano and Davenport. 

The action of a correlation detector is to multiply two 
waveforms together and perform an integration or 
smoothing of the product. The situation is depicted in 
Fig. 1. The detector consists of the multiplier and inte- 


ADDITIVE 
NOISES 


PERTURBING 
FUNCTION 


SIGNAL 


INTEGRATING OUTPUT 


FILTER 


MULTIPLIER 


u,(t) 


No (t) 
No w) 


Fig. 1—Correlation detector. 


grating filter. Usually there is some relationship between 
the two input waveforms [indicated in Fig. 1 as u,(¢t) and 
us(t)] causing them to have a nonzero correlation. As a 
matter of fact, in most practical cases, they are the same 
waveform x(t) disturbed by the addition of noise [n,(é) 
and m(t)], and possibly also distorted dissimilarly in 
some other way. The effect of the noise at the detector 


*The two are, of course, equivalent if the sampling rate is 
sufficiently high. 


— or a 
oe 


jtput tends to zero as the duration of the integration 
ows indefinitely. In a particular system application, 
ke practical necessity. of limiting this integration time to 
site values produces a finite signal-to-noise ratio at the 
itput. It is this signal-to-noise ratio that specifies the 
trformance of the detector. 

‘This paper presents an analysis of the output signal- 
-noise ratio of a correlation detector under the following 
ssumptions: 


i) The signal a(/), and the noises n,(#) and n,(t), 
| which are inserted additively, are independent 
stationary and ergodic random functions of time 
with Gaussian first- and second-order amplitude 
distributions, and with Fourier-transformable power 
density spectra X(w), V,(w), and N.(w), respectively, 
all confined to 2rW, a closed interval in w. For 
simplicity of computation, a single-sided frequency 
spectrum is used for all functions of time, whereas 
a double-sided representation is employed for net- 
work system functions. 

The only perturbing function causing any dis- 
similarity in w,(¢) and u,(¢) not accounted for by 
the added noises, is a linear, but not necessarily 
physically realizable, time-invariant network with 
a complex system function H(w) = | H(w) | e’"°°”’, 
Fourier-transformable into the impulse response 


ae 


1 this treatment, such a filter is included in only one of 
ae two correlator inputs, since the presence of a different 
(ter in each input can be treated by readjusting the signal 
‘f) to represent one correlator input while lumping the 
ifference in the two perturbing filters into the single 
iter shown in Fig. 1. The presence in the analysis of the 
brturbing filter can be exploited in many ways. For 
kample, we will use it later to represent a time shift 
stween the two signal components at the correlator 
puts. It can also be used to represent the presence of 
i ssimilar filtering of the two signals, and of such propaga- 
lon effects as time invariant multipath and dispersion. 
hese latter details will not be pursued in this paper. 


3) The multiplier is an ideal four-quadrant multiplier 
whose output is the instantaneous product of the 
values of u, and wo. 

4) The integrating filter is a realizable two-terminal 

_ pair device with complex system function I(w), 

_ Fourier-transformable into the filter impulse response 


i(t). 


1 the particular case where 7(f) is a rectangular pulse of 
ration 7’, we will call the filter an ¢deal integrator of 
tegration time T. For other appropriate forms of 2(d), 


| 5 The presence of a perturbing function in the form of a time 
variant nonlinear distortion of amplitudes has been treated in 
ne following references: J. J. Bussgang, ‘ ‘Crosscorrelation Functions 
' Amplitude-Distorted Gaussian Signals,’ Tech. Rep. No. 216, 
jes. Lab. of Electronics, M.1. T.; March 26, 1952, and R. D. Luce, 
‘Amplitude distorted signals,”’ Role. Quarterly y Prog. Rep., pp. 
—41; April 15, 1953. 


i 


Green: The Output Signal-to-Noise Ratio of Correlation Detectors ll 


there will turn out to be an effective integration time T 
equal to the reciprocal of the effective noise bandwidth 
of the filter. 

Our problem will be to compute the output signal-to- 
noise ratio, that is, the ratio of the square of the de output 
voltage from the integrating filter to the fluctuation power 
at the same point. Specifically, we proceed as follows: An 
infinite ensemble of sets of the three functions x(f), 
m,(t), and s(t) is considered, all of period 6, where 6 
is long compared to the duration of significant values of 
the filter response 7(¢). For one set of the three functions 
x, n,, and nz out of the ensemble of such sets, each function 
is expanded in a Fourier series, thus giving three line 
spectra, where each line represents a Fourier amplitude 
and phase coefficient pair. Then the operations indicated 
in Fig. 1 are carried out on these functions to derive the 
amplitude and phase line spectrum at the multiplier 
output. The power in each spectral line is then found 
from the square of the amplitude, and the ensemble 
average power in the various spectral lines is then com- 
puted based on known statistical properties of the Fourier 
coefficients. Then | Z(w) |’, the squared magnitude of 
the integrator’s system function, is applied to this discrete 
power spectrum at the multiplier output. The final step 
is to allow the Fourier period @ to grow without limit, 
whereupon the summations involving the Fourier co- 
efficients become integrals involving the power density 
spectra X(w), Ni(w), and N.(w). The power density 
spectrum of the integrator output is a line impulse at the 
origin representing the de output signal, plus a continuous 
spectrum representing fluctuations. The ratio of these 
two is the desired output signal-to-noise ratio. 

In the next section we will treat in this way the corre- 
lation detector just as shown in Fig. 1. Then we will treat 
a variation of this in which the two input functions are 
displaced in frequency and the integrating filter is a 
band-pass rather than a low-pass filter. We distinguish 
between these two types by calling them low-pass and 
band-pass type correlation detectors, respectively. 


Low-Pass CorrRELATION DETECTOR 


For each set of the ensemble of signal and noise func- 
tions, we have the following Fourier expansions 


x) = Yk cos (wit + $9), (a 
ni(®) = Sov, 008 (wit +99), 2) 
and 
na(l) = Sms c08 (wit + 8). 8) 
Then 
inl) = oC) + ne) 
= Veheswttet +m) & 


12 IRE TRANSACTIONS ON INFORMATION THEORY 


and 
u(t) = x(t) + no(t) (5) 


where w, represents the lower edge of the band of width 
W eps placed to include all signal and noise components, 
@; = w + 27/0, and h; and 7, are the perturbing filter 
amplitude and phase functions | H(w;) | and are [H(w,)], 
respectively. It is known® that when 2(é) is a stationary 
Gaussian random process having power density spectrum 
X(w), each £; has a Rayleigh distribution over the en- 
semble with 


fe = 2X(w;) Aw (6) 


and 


2) = 8X%(w,)(Aw)” (7) 


for sufficiently small Aw = 27/6. If 7 ¥ j, &; and &; are 
independent, as are ¢; and ¢;. The phase angle ¢; has a 
probability distribution which is flat from —z to + 7. 
Similar statements hold for n,(¢) in relation to y;, y;, and 
the power density spectrum N,(w) and for n,(¢) in relation 
OVW O;- BUG ON 5 (ce): 

At the multiplier output there will be four distinct 
contributions from the product 


wuld = [yO + uM ]|[e® + nO], (8) 


which we will designate the X XK Y, X X N,, Y X Ng, 
and N, X N, terms, using the subscripts I, II, III, and 
IV, respectively. Fig. 2 depicts the line spectrum of the 
power of one ensemble member of 2(¢), m,(¢), and n.(é), 
and shows how u,(t) = y(t) + ,(é) and u(t) = a(t) + 
n(t). It also shows how the operation of the multiplier 
produces the signal and noise components. (There are 
also components about the double frequency that will 
be ignored since they can be presumed to lie outside the 
integrating filter pass band.) Note the large de output 
signal resulting from the coherence between x(f) and y/(f). 

Appendix I contains the detailed bookkeeping involved 
in carrying out the steps outlined in the introduction. 
For a given ensemble member, the discrete [Fourier 
amplitude and phase spectra at the multiplier output 
are first computed [see (23), (29), and (34)]. From these, 
the corresponding discrete ensemble average power 
spectra are deduced, making use of the statistical in- 
dependence between the signals and noises x(t), m,(é), 
and n,(t). The lack of such independence in x(¢) and y(t) 
leads to a behavior in the X X Y term [(25) and (28)] 
that is entirely different from that in any of the other 
three [(31) through (33) and (35) through (37)]. In 
particular, the X X Y term contains a component at 
de [second term of (25)] that does not approach zero in 
the limit as @ — ©. This is the correlated signal output. 
By multiplying this by the integrating filter’s de response, 
and then letting @ — ©, we have the final signal output 


6§. O. Rice, ‘“Mathematical analysis of random noise,’ Bell 
Sys. Tech. J., sec. 2.8.; July, 1944 and January, 1945. 


March 


Ny (b) 
No (c) 


InI? wy (d) 
U 


: 
elHI?X +N, fe) 


U2 
= X+ No (f) 


Uy X Up ae P (g) 
\ 


\ 


Fig. 2—Showing the expansion of x(t), m(t), and no(t) of finite 
period @ into Fourier series; how the multiplication operation 
produces a strong de spike (the output signal) from the co- 
herence between x(t) and y(t); and how the integrating filtei 
passes this spike, plus a small portion of the other components 
(the output fluctuations). 


power, S, (40). By multiplying all the other terms by 
the correct value of the integrating filter response and 
then taking the limit, we have the noise output power, 
N, (41). The ratio of the two is the desired signal-to- 
noise ratio result. 


(3) 2 2 1(0) | / " X(@RelH(@)| te} | 


| { [| 1) ? e(lx(o 71) XH) 
+ XW)*N,(e) + (XW) | H&) PY*Nole) 
+ Ni(w)*No()] ta) 'G 
in which the operation “*” is defined by 
Als)*Ba) = [AWB +4) + Ale + 6) BG) du (10) 
and the abbreviation Re means “real part of.” 


Banp-Pass CorRELATION DETECTOR 


Suppose in Fig. 1 we use as the lower detector input 
signal u,(¢), exactly the same function as before, except 
displaced downward in radian frequency by A. This can 


1957 
»e taken into account by rewriting the Fourier expansions 
3) and (5) as 


ow 
nt) = 2) wi cos (wt — At + 4) (3a) 


nd 
Ow 
u(t) = 2) & cos (wt — At+ 9) +m,(#). (a) 


e assume that the integrating filter is tuned to A (that 
, the frequency of maximum response of the filter is A) 
ind that A is at least equal to 2rW the bandwidth of 
lignificant values of signal and noises. We now seek as 
\S/N). the ratio of one half the squared amplitude of 
lhe difference frequency tone at A to the remaining 
power, representing fluctuations. 

Notice that the band-pass detector does not perform 
lhe true operation of cross-correlation. The cross-corre- 
tion function of two signals with nonoverlapping spectra 
s identically zero. Nevertheless, we shall see that the 
and-pass scheme, as a detector of signals is in most 
fespects equivalent to the correlation operation. 

In Appendix II is computed the power in the various 
nes at the multipler output which is examined now in 
he neighborhood of A. The desired output signal-to-noise 
jatio is 


Green: The Output Signal-to-Noise Ratio of Correlation Detectors 13 


ponent is due to the fact that the signal output of the 
detector is a finite-time measurement of a property of a 
random function, (in this case it is a short-time cross- 
correlation’). Therefore, the measurement itself fluctuates 
(although less and less with increased integrating time), 
and this fluctuation is the self-noise. The difference in 
the self-noise component for the two types of detectors 
is due to the fact that in the band-pass detector, spectral 
components symmetrically placed about the difference 
frequency A add on a power basis, whereas in the low- 
pass detector, they are really at the same frequency, and 
thus add on a voltage basis. The other three denominator 
integrand terms are seen to be composed of the con- 
volutions of the corresponding power density spectra. 


PARTICULAR CASES 


The (S/N), equations (9) and (11) are applicable to 
any situation obeying the original assumptions. In order 
to see what these equations mean in physical terms, 
let us first make the further assumption that the inte- 
grating filter bandwidth is small compared to the signal 
and noise spectra, which in turn are reasonably continuous 
functions of w within their bandwidth 21W. This will 
insure that the output noise spectra are approximately 
constant across the integrator pass band. Then (9) and 
(11) for low- and band-pass detectors can be rewritten 


2,- (somone 


as lf X(w)Im[H()] ds | \ 


le | I) |? it [X@wX(u + w) | Hu + w) |’ 
+ X(WNi(u + w) + X(u) | HW |’? Nu + A +e) 


BS AEN anni: Abc «| di dah (11) 


Seutr-NoIseE 


_ Attention should be drawn to the first denominator 
ategrand terms in both (9) and (11), the ones involving 
nly X, | H |, and 7. These terms signify the presence of 
uctuation components in the output, even for both 
i(w) and N.(w) identically zero. This self-noise com- 


| “(X') (Hoe + X@)N@ + XC) LH) PNG NIG Nee) ae 


g { fl ” X(e)Re[H(w)] as} 
(3) ee (12) 
PORE) + XONW + XW) | HO) | N) + MON) de 
and 
‘ { i X(o)Re(H(6)] do} +4 ik X()Im[H(@)] do : 
(3) . 7, Ce) 


respectively. In these two equations the effect of the 
integrating filter is given by the single quantity 


Waseke cowie {i Teale de (14) 


which will be recognized as the effective noise bandwidth* of 
the filter in radians per second. The reciprocal of W,/2z, 
the effective noise bandwidth (in cycles per second), can 
be called the effective integration time’ of the filter since 
it produces the same (S/N), as an ideal integrating filter 
having a rectangular impulse response of duration 27/W,. 


7R. M. Fano, ‘“‘Short-time autocorrelation functions and power 
spectra,’ J. Acoust. Soc. Am., vol. 22, pp. 546-50; September, 
1950. 

8 J. L. Lawson and G. E. Uhlenbeck, ‘‘Threshold Signals,”’ M.I.T. 
Radiation Lab. Series, vol. 24, McGraw-Hill Book Co., Inc., New 
York, N-Y., p..176; 1954. 

*To the knowledge of the author, the equivalence between 
reciprocal of noise bandwidth and the effective integrating time 
of a filter (the time duration of an ideal filter having the same 
effect) was first derived by B. L. Basore (private communication). 


14 IRE TRANSACTIONS ON 


One can verify this by substituting the Fourier transform 
of such an ideal impulse response into (14). Ideal impulse 
responses of low-pass and band-pass filters are depicted 
in Fig. 3. 


i(t) 


(a) 


5 
i(t) 


(b) 


r > 


Fig. 3—Impulses response of ideal integrators (a) low-pass, (b) 
band-pass (the phase angle and number of cycles is immaterial 
for present purposes). 


The (S/N), equations (12) and (13) can be applied 
to a number of even more restrictive cases that are also 
interesting. As an example, the case where H(w) unity 
and the three time functions (1, n,;, and n.) have the 
same spectral shapes gives 

1 ‘, 
Pi P2 


S W, 1 

Gi) = Fe let at 
where ¢€ is a quantity equal to 2 or 1 for low-pass or band- 
pass detectors, respectively. In these expressions W, is 
the effective noise bandwidth of the signal and the two 
noises, p,; and p, are the signal-to-noise power ratios 
X(w)/N,(w) and X(w)/N.(w) respectively, and F is a 
spectrum form factor, 


1 
a 
P2 


(15) 


ae [ D@ de 


2 (16) 
Doax(e) il Dey oe 


some values of which are tabulated in Table I.*° (D is an 
arbitrary power density spectrum). 

Another useful situation is that in which the two 
noises are white with spectral densities V,, and N,, watts 
per radian per second; H(w) still unity. In this case, 


[er af NoNoo We. it (17) 


Kee W, 
where X,n.x 18 the maximum value of the signal power 
density spectrum X(w), and W’ is the bandwidth of the 


We 
= W, 


Not 


ry 
xX max 


Noe 
Dito 


+5 + 


10 The case originally treated by Fano (footnote 2), used H(w) = 
eie7, and employed the “single-tuned-circuit”” shape of spectrum 
throughout. Using the appropriate value F = \, our (15) reduces 
to his corresponding results. 


INFORMATION THEORY March 
TABLE I 
Various VALUES OF THE SpectRUM Form Facror F AppHaRING 
IN THE OurpeuT-SIGNAL-To-NoIsE EXPRESSIONS IN CERTAIN 
SIMPLE CASES 


Type of Form 
Spectrum Density Function D(w) Factor 
F(D) | 
Rectangular ft for w in bandwidth Q 1 
0 otherwise 
Triangular 1 — 2|w — w,|/Q for |w — w-| < 2/2) 2/3 
(0 otherwise 4 
Gaussian exp [—(w — a-)?] 1//2 
Exponential exp [— | w — @-|] 1/2 
First-order 
Butterworth [1 — @ — @,)7] 1 1/2 
(single-tuned RLC 
circuit) 
nth-order [1 — @ — we)]7! 1-1/2n 
Butterworth 


two white (rectangular spectrum) noises, W’ being assumed 
large enough to include all of X(w). This expression is 
seen to be similar to (15) except that the input signal-to- 
noise ratios are redefined in terms of power densvties 
where the signal power density is that at the signal 
spectrum peak. Also, the form factor F enters in a different 
fashion. 


COMPARISON OF THE Two TypPpEs OF CORRELATION 
DETECTOR 


A comparison of (9) and (11) shows that the main. 
difference between the two is in the numerator and de- 
nominator X X Y terms; that is, the output signal and 
self-noise power, respectively. The first of these proves 
to be the more interesting. To bring out the difference. 
in behavior of the two types of detector, let the “perturbing 
filter’ be just a pure delay, r. That is, set H(w) = e’°”. 
Then the signal output voltage, the square root of the 
numerator of (9), is 

i X(w) cos wr dw (18) 
0 

or the correlation function, whereas the square-root of 
the numerator of (11) is 

© 2 © 2) 1/2 | 
Lf X(w) Cos wt a | + [i X (w) sin wr as | \ (19) 

0 0 
or the envelope of the correlation function. That is, if we’ 
call the correlation function 


bo(1) = lim ge f aOxlt + 2) dr | 


= / X(w) cosw7t dw = if X(w) cos wr dw (20) 
—o 0 


(because of X(w) being zero on the negative w axis), and 
similarly write the Hilbert transform of the correlation 
function” : 


41 One can imagine ¢,(r) as the projection on the real axis of a. 
complex vector that rotates with 7 at a uniform angular velocity, 
and y (7) as the projection on the imaginary axis of the same vector. | 


~ 


(7) = a X(w) sin wr dw = fig X(w) sin wr dw, (21) 
‘en (18) is ¢,(r) and (19) is 
: (4260) + VOT”. as 


Fig. 4 illustrates the difference. At a is the spectral 
snsity of a function whose bandwidth happens to be 
uch smaller than its center frequency. Curve b shows 
he signal output voltage as a function of 7 from a low- 
ass correlator, and c the band-pass detector output. 


lig. 4—Illustrating the action of a band-pass correlation detector 

in giving the envelope of the correlation function rather than the 

| correlation function itself as with the low-pass correlation 

detector. (a) Typical power density spectrum, (b) output 

voltage of low-pass detector vs delay 7, (c) same for band-pass 
detector. 


CoNCLUSION 


_ A correlation detector is defined here as a set of circuit 
ements that forms the integrated, or averaged, product 
If two time functions which differ in that one or both 
as noise added to it and also has been perturbed by 
assage through a known linear “perturbing filter’ 
aracteristic H(w). Expressions were derived for the 
utput signal-to-noise ratio of two types of correlation 
etector: 1) The low-pass detector, in which the integrator 
s an arbitrary low-pass filter, (9), and 2) the band-pass 
orrelation detector, (11), in which one of the two input 
actions was deliberately displaced in frequency by A, 


p57 Green: The Output Signal-to-Noise Ratio of Correlation Detectors 15 


and the integrator is therefore an arbitrary band-pass 
filter tuned to A. It was assumed that signal and noises 
are Gaussian.” 

The output signal-to-noise ratio is a function of the 
signal and noise power density spectra, and the system 
functions of the linear filters in the two inputs and the 
narrow-band integrating filter. In certain simple cases, 
the output signal-to-noise ratio can be expressed more 
easily using the effective noise bandwidth of the integrat- 
ing filter and a certain spectrum form factor, [(16) and 
Table I]. The expressions derived for output signal-to- 
noise ratio extend previously available results to include 
arbitrary signal, noise, and integrating filter spectral 
shapes and the presence of the perturbing filter. 

It was also shown that although the operations per- 
formed by the band-pass type of detector are considerably 
different from the true mathematical operation of corre- 
lation, the signal-to-noise ratio results are substantially 
the same, and the signal output as a function of a relative 
delay between input signals is the (suitably defined) 
envelope of the correlation function (22) rather than the 
correlation function itself, (18)."* 


APPENDIX [| 


Low-Pass DETECTOR CALCULATIONS 


When w,(¢) and u(t) [defined as Fourier expansions in 
(1) through (5)] are multiplied together (Fig. 1), there 
are four sets of product terms, the X X Y terms (1), 
the X X N, terms (II), the Y X N, terms (III), and 
the N, X N, terms (IV) The multiplier output voltage 
at w = 0 for the first of these is 


1 Ow 
V,0) = 9 2 Eth, COS 7; (23) 
5a 
and the power is 
P,(0) = Vi) = | DER? cos" n, 
- dD Eth; cos n: D7) gh; cosy; (24) 
i iAt by 


(Summations will be assumed to run from 1 to @W.) The 
ensemble average power in this spectral line is 


PO) = 4 DENS cos», 


SS Ph COS 7;- (25) 


a > Ph, COS 7; 
ah jAi 


The bar represents the ensemble average. 


2 Only the self-noise terms are incorrect if this assumption is 
violated, so long as a proportional relationship like (6) holds. (Our 
self-noise results depend on the statistical independence of Fourier 
coefficients for z ¥ 7.) 

13The reviewer has kindly pointed out that the band-pass 
detector performance could be obtained with the low-pass detection 
scheme as follows: A second low-pass detector is constructed 
identical to that in Fig. 1, except that H(w) is replaced by its com- 
plex conjugate. The output voltages of the two low-pass detectors 
are then combined by squaring each, adding, and then taking the 
square root of the sum. 


16 IRE TRANSACTIONS ON INFORMATION THEORY 


For frequencies different from zero (a = 27n/0, n = 


Qrnt 
cos (2a! + ¢ 


sy bere ho fe cos (7 aut SOE Seiten 6.) 


Dien == NQitn —_ 6) 


2rnt 


COs 


i Bare Pac COs A, ar Evingsh; cos Bin) 


Nile 


— si np 2mm De (Ef hice sim A + £s:ck:h, Sin Bs) (26) 


van the abbreviations 4;, = (6:1. + Misn — $:) and 

in = (Oi+n — 2; — $;). The power in each of these 
Ae Fe lines at nonzero frequency is one half the square 
of the amplitude, the latter being the sum of the squares 
of the two summations in (26). 


p,(22") = 5 » [Eesindien COS Ase ata Esingshi cos Bar 


a (E;€;snlisn sin As, 4 Evanésh; sin Bs): 


+ other terms, each of which is the product of cosine 
or sine of A,, or B;, with cosine or sine of A,, or 
Beewith 454 4; (27) 


In taking the ensemble average of this expression, each of 
the last-named terms becomes zero. To show this, one 
can make a decomposition of a typical term, say cos A;, 
cos B,,, into a number of subterms, each of which contains 
COS $;o;; OF Sin ¢;,,; aS a factor. Because the phase co- 
efficients ¢; and ¢; (¢ #* J) are independent, and dis- 
tributed with uniform probability over the interval 
(— 7, 1), these ensemble averages are zero. We thus have 


Pi) = § DE Ot ahha 
cos (ny + msn) + hisn) Via One eS) 
Going on to the X X N, term at zero frequency, 
Vn(0) = 5 Dvds cos — 1) (29) 
from which 
Pu) = 7 LE cos — 1) 
+ 4 Lxb cos — 7) LH k cos — 7). BO) 


But cos (¢; — y;) is zero for any 72, since ¢; and y, are 
independent (a(f) and n,(¢) being independent) and 
uniformly distributed over the interval (— z, 7). There- 


fore only the first term of (30) remains, and 


P(0) = 2 5 La EN. (31) 

Similarly 
Pi) = 3 DME, (32) 
and 
PO = § Deiat 3) 


As for the X X N, term away from zero frequency, we! 
have 


2 1 2rnt 
me = 9 My, Viéisn COS ce Se ues = v:] 


a 


2 | 
+ a COs Ge ag : =e vas) (34) 


from which 


2, AG 1 2 a2 ; 
Py) 7 xe, (5 Eran Tt Yoon 6) 


(35)! 


by the same steps as those leading from (26) to (27). 
Similarly for the Y & N,. term 


2 1 oor es sath ee Ben 
Prul 2a — 8 ye (ui am ies a josoe fi Woy 


Pr (36) 


and the N, X N, term 


Po") =2 DG date) Ge 

Now that all four components of the multiplier output 
spectrum have been computed, they must be operated: 
on by the integrating filter’s response | /(w) |*. Treating 
the signal component [second term of (25)] in this way, 
and then taking the limit as 6 > ~, we have the output; 
signal power: 


So = lim | L(0) |? we h, cos n: 20 Eh, cos 7; (38) 

= him | LOvels »S, X ;h; cos n;Aw Dy X;h; cos Me 

Aw=27/8 (39) 

from (6). | 
= | 1(0) |? Ha X(w) | H(w) | cos n(w) as] 

= | 10) | [ XeRe(Hey da | (40) 


where the abbreviation “Re” means “real part. of.” 
The fluctuation power N, at the output can be found by 
multiplying | [(23n/@) |° by all the corresponding multiplier 


957 


yutput terms save only the second term of (25) and going 
io the limit as 6 > ~. 


5) [aPe) +05) 


oo) 


ve = lim >. 


b>0 n=0 


where the superscript * indicates the omission of the last 
erm when n = 0. A typical term in this expression, say 
he last one, becomes from (37), 


Ree ale a) ; i) ae) 
Lae Dak: 9) AG 
hw=27/0 


vlc) (a 


= ; is | I(w) i do fie [N (uw) N o(u + w) 


+ Niu + w)N(u)) du. (42) 


Whe final signal-to-noise ratio (S/N)) is the ratio of 
‘xpressions (40) and (41), namely (9). 

It should be clear now how the second term in (25) was 
recognized as the output signal term. In using the limiting 
relations (6) and (7) for replacing Fourier coefficients with 
‘pectral densities, the first term of (3) becomes an integral 
‘imes Aw which is approaching zero. The second term, 
however, is an integral squared and remains as a constant 
spectral line at the origin during the limiting process. 


APPENDIX II 


Banp-Pass DETECTOR CALCULATIONS 
_ The calculations for the band-pass detector are con- 
iderably simpler than for the low-pass case. When the 
lignals x(t) and y(/) are multiplied, the voltage at the 
lifference frequency A is 
1 

| Vil) = 5 2 Eths cos (At + 1.) (43) 
und the power is one half the squared amplitude, giving 
s the ensemble average, 


5 be Eth; Cos 0 | 


2 


ae ; ps Eh, sin n | 


: > eh} + ; > Eh; cos qn, D> Eh; cos n; 
i i iA; 


> #h,; sin n;. (44) 


TAQ 


+ : Ss Ph, sin 7; 


For frequencies different from A (o = A + 2mn/6 


Nal Pe IV) 


Green: The Output Signal-to-Noise Ratio of Correlation Detectors ile 


2rnt 


v(a ae 22) = ; De Eee linen COS (a1 + on 


e 
-+- Ditn =. i = nies) 


a ; COS (a1 -- =) De te et (Nore 


‘COs Chas ome d; + Nitn) 


Qrnt 


al 2 sin (a1 =F 6 ) a Be od (lac 


‘sin (Oz 5 ae d; aE Nisn) (45) 


from which 


r{ar®) 


| 2 2 5 
8 oe & EF ele an l COS” (iin a d; ar Nin) 


+ sin” iin ee ys ei Ni+n) | 
il 232 1.2 
= 9 Lotion: (46) 


In a similar fashion, we have for the X X N, term at 
any frequency, including A, 


yy il 
Vala aie a =e 2 Dy Vinki 


Qrnt 


“COS (a1 + <p a Yon 6.) (47) 
where n = 0, + 1, ---, + OW. From this 
ws lecme iy lee eee ae 

P(A + 2 = g DS ier eo (48) 


The Y X N, and N, X N, terms are computed similarly: 


Dy, 
Pri(a =i 2am = 


: 2 Boy 
Pr(a ate 2 rr. Ds ap fe 
This time, the signal power is represented by the last 


two of the three terms in (44). Upon multiplying this by 
the integrating filter response and letting 6 > © 


| 14) |? [3 2D Xhi 


Sacked, Gaia (49) 


OO |r 


(50) 


00 |r 


Sy = 


*COS n; Aw Ss X ;h; cos n; Aw 


i#t 


Du exa he sin nite | 


jAt 


+ ; > Ales sin 7; Aw 


; [ie X(o)Re[H(o)] as | 


en Ue ‘Gute (el te | ans) 


18 
The output noise power is 
x | c lo _ 
No = lim 7 (a a a) | | (a a st) 
en id Goer 
au ATT, + Pin A+; 
be aear (52) 


where the superscript * indicates omission of the last 
two terms when n = 0. 


IRE TRANSACTIONS ON INFORMATION THEORY 


March 


No an | Iw) ily [X (wu) X(u +- w) | Hu + w) ke 


+ X(u)Niu + @) 
+ Xu +) | Hu +) |’ Nou + A+ @) 
+ Ni@w)Ni(u + A + w)] du dw 
The ratio of (51) to (53) is the desired (S/N)o, (11). 
ACKNOWLEDGMENT 


The author would like to express his gratitude to Drs. 
B. L. Basore, W. B. Davenport, Jr., R. M. Fano, and R. 


Price for much of the motivation for this study, and for | 


many helpful suggestions. 


Error Rates in Pulse Position Coding’ 


L. LORNE CAMPBELLt 


Summary—aAn expression for the error rate in a system using a 
binary pulse position code is derived. In the system considered, 
the pulses amplitude modulate a carrier and the resultant signal 
is contaminated by additive Gaussian noise. At the receiver the 
pulses are recovered by an envelope detector. If synchronization 
errors and post-detection filtering are neglected, it is shown that 
the probability of a binary error is approximated well by 1/2 exp 
(—a?/2), where a? is the peak input signal-to-noise power ratio. 
Finally, the error rate is derived for the case where the signal 
amplitude is subject to random fading. Some comparisons are 
made with error rates derived by Montgomery for other systems 
with and without carrier fading. It is found that when the signal is 
subject to fading the pulse position system is better than a com- 
parable system using threshold detection. 


List oF PRINCIPAL SYMBOLS 


T: = frame duration. 

F = carrier frequency. 

w(f) = power spectrum of input noise. 

ie = pulse amplitude. 

Pi = root-mean-square pulse amplitude with Ray- 
leigh fading. 

Yo = f w(f)df = total input noise power. 
0 

7 = envelope of relative autocorrelation function of 
input noise for a delay 7'/2. 

a” = P’/(2y.) = input signal-to-noise power ratio at 
the pulse peak. 

a; = P>/(2¥.) = mean signal-to-noise power ratio 
with Rayleigh fading. 

L*(x) = generalized Laguerre polynomial. 


* Manuscript received by the PGIT, September 18, 1956. 
+ Radio Physics Lab., Def. Res. Board, Ottawa, Canada. Work 
performed under project PCC No, D48-28-35-05. 


system with Rayleigh fading. 


INTRODUCTION 


some advantages over a simple on-off pulse code, 

particularly when the signal amplitude is subject 
to large fluctuations. The object of this paper is to estimate 
the frequency of errors in a communication system using 
this form of coding. 

In the system considered here, the time scale is divided 
into frames of duration 7’. A pulse is transmitted in either 
the first half or the second half of the frame. The position 
of the pulse in the frame then conveys one binary digit 
of information. In this paper, we consider a communi- 


pee pulse position coding appears to offer . 


cation system in which these pulses are used to amplitude ‘ 


(53) 


I\® (x) = kth derivative of modified Bessel function of | 
order zero. 

p = probability of error in one binary digit in the — 
pulse position system with steady signal. 

De = probability of error in a character in a five- } 
digit code. 

Pr = probability of error in envelope detection, , 
fixed threshold system with no fading. 

Dr = probability of error in pulse position system 
with Rayleigh fading. 

rr = probability of error in envelope detection, fixed 
threshold system with Rayleigh fading. 

Drs = probability of error in synchronous detection 


f 


, 


modulate a radio frequency carrier. At the receiver the | 


pulses are regained at the output of an envelope detector. 


The decision as to whether the pulse is in the first or the - 


icoud half of the frame is made by comparing the outputs 
| the detector at the times 7/4 and 37/4, measured 
iom the beginning of the frame. The pulse is then said 
» be in the first or second position according as the output 
; the time 7'/4 or 37'/4 is greater. 

| Errors may occur if the transmission link is noisy. The 
feet of additive white Gaussian noise will be considered 
bre. The spectrum of the noise at the input to the detector 
_ then determined by the predetection filters in the 
poeiver. It will also be assumed here that the receiver 
jaows when each frame begins. This information may be 
rovided on a parallel synchronization channel or by 
ther means. 


IDERIVATION OF ERrRor RatTE—UNcORRELATED NOISE 
Let the input to the detector be 


(P cos 2nF't + VeQestora Oa tr) O. 
V(t) Oa Ne) 


im (1), V,(é) is a narrow-band Gaussian noise voltage 
‘ith power [spectrum w(f), and Pecos2rFi, for 
| < 4 < 1/2, is a square pulse of amplitude P. This 
hput is what we would have if the pulse is transmitted in 
the first half of the frame 0 < ¢ < 7’. If the pulse were in the 
ia half the analysis would be exactly the same. The 
nalysis would not be affected if the pulse is shaped, 
‘rovided that the pulse amplitude at ¢ = 7/4 is P and 
‘ty = 31/4 is zero. 

| Let the output of the envelope detector be R(t). Then 
fk R(T/4) > R(3T/4) we say that the pulse is in the 
first half-frame and conversely. Thus, for the input (1), 
In error is made if R(T/4) < R(3T/4). 

If the noise voltage, V,(é), is confined to a spectral 
»and whose width is small compared with F we can write 


[A(t) + P] cos 2rF t 
+ B(t) sin 2rFt 
A(t) cos 2xFt 
+ B(t) sin 2rFt 


‘or 1) 


for 


ae a 


(OR art aca 2) (2) 


(T/2<t <7), 


vhere A(¢) and B(t) are normally distributed and are 
slowly varying compared with cos 27Ft. The output of 
the detector is then 


a ee 
[A? ae 1 Boal eG 


Oeil 2) 
(yl Pe IY 


(3) 


We shall first derive an expression for the error rate on 
che assumption that the input noise at ¢ = 7'/4 is effec- 
tively uncorrelated with the input noise at ¢ = 37'/4. Let 


Rk, = RW/4), R, = R(BT/4). (4) 


A ccording to the well-known result on the output of 
nvelope detectors,’ the probability density of R, is 


| 18. O. Rice, ‘‘“Mathematical analysis of random noise,’ Bell 
Sys. Tech. J., vol. 24, pp. 46-156; January, 1945. See section 3.10. 


p57 Campbell: Error Rates in Pulse Position Coding 19 


ee ee) (ee 
Th exp | mea eh 


and the probability density of R, is 


Bs exp ($22) 
Roe AO pye 


where Yo, the total input noise power, is given by 


[ wnar. 


Yo = (5) 
Since, by hypothesis, R, and FR, are uncorrelated, the 
joint probability density of R, and R,, F(R, R.), is given 
by 


ike ee 
Vania 2Y A 


Since an error occurs if R, > &,, the probability of an 
error in a frame, p, is given by 


F(R,, Ry) = ) (6) 


Re i dR, ih dR, F(Rs, Rs). (7) 


This integral is easily evaluated [see (51) in the Appendix] 


to give 
De : exp ( * 
2 2 4 


where a’ is the signal-to-noise power ratio in the input 
att = 7/4. That.is, 


(8) 


i P*/2o. (9) 


Eq. (8) gives a simple approximation to the probability 
of error in a frame on the assumption that the noise at 
the two sampling positions is uncorrelated. It will be 
shown later that (8) remains a good approximation in 
many cases in which correlation might be expected to 
play a part. 


DERIVATION OF ERROR RATE—CORRELATED NOISE 


A more accurate expression for the error probability 
is obtained if we take account of the correlation between 
R, and R,. To this end we let 


se AC A) iat BLS) 
%, = Al /A4), Yo = Bola 
Then, by a simple modification of the result given by 


Rice’ for the case of no signal, the joint probability 
density function of x,, v2, ¥,, Y2 is given by 


palX1, V2, Yi, Ys) = [4m°( Wo =F bie a eaves 
exp Vola Tee +a;+ Yi ae y>| 
= Qu3\G@i — Pix, + y.y2] 


sn Pe GO IO ae Yrte]}/2(% — pis — Mis), (10) 


2 Tbid., section 3.7. 


20 IRE TRANSACTIONS ON INFORMATION THEORY 
where 

mis = [ w(s) cosm(f — FY af, (11) 
and 

mis = fo wi) sin ah = FT af. (12) 


We now make the substitutions 


Ge slo. (GOS Oe. Jpg ee GU Cs ee 


Then FR, and R, represent the detector outputs at the 
times 7/4 and 37/4 respectively. R, and R, have the 
joint probability density function /’(R,, Rz), where 


F(R,,R) = RR, | aa, fae, 
-p,(Ri COS 0,, Re cos 65, R, sin 6, R. sin 02). (13) 


Since an error occurs if R, > R,, the probability of error, 
p, is given, as before, by 


p= i dR, / ipa 1) (7) 
0 Ri 


The details of the calculation of p are left to the 
Appendix. The result is that 


March 


The first two coefficients of the series (14), fo and fi, 
are given by 


(19) 


aman 2 
=a exp ( =) 
and 

(a — 4a’) 


2 (=*) 
Dame es) 


As shown in the Appendix, an upper bound to the 
absolute values of f, can be found. This permits us to 
estimate the accuracy of the value of p which is obtained 
from the first few terms of the expansion (14). The bound 
is given by inequality . 


fi = (20) 


lf. | <2 exp ( =e) (21) 4 
From (14), (19), (20), and (21), we have 
p= 1/2 (b= a (a — 4)] exp (= 2 +O), (@23 
where 
LOG |< ar exp (=") = 77 ex (S*). 2) 
When r = 0, (22) reduces to the simpler (8). | 


Discussion oF RESULTS 


me 2n \ 
i: pe far"; (14) In order to appreciate these results it is necessary to 
ave have some idea of the magnitude of the parameter r. We - 
Mea note first that ris the envelope of the relative autocorrel- ' 
he (u2, + 2,)*”? (15) ation function of the input noise evaluated for a delay of 
Wo 2 T/2. That is, j 
and (2) 
& 2 : 
fe = 20% [pe [Ea(o) — Eaalo') re aes om 
0 
(2a0)* where ¥(7) is the autocorrelation function of the input 
DD Li_,(a? + p’) crs 5 (2ap) do. (16) noise. This autocorrelation function is given by ) 


The functions L%(x) are the generalized Laguerre poly- (25) ' 
nomials and are defined by 


Ei il SO Oe 


For example, let us assume that the input noise has a_ 


e 


Lia) = — fen prea (17) rectangular spectrum centered on the frequency F. 
That is, 
L2(x) will often be written cee Also, in (16), Ww 
0 for O0O<f<F- Dee 7 
Iy'(@) = £ ia), (18) 
wf)= 4N. for F— 5 < jer ee 
where J,(x) is the modified Bessel Function of order zero. 2 
The expansion (14) for p holds for | 7 | < 1. Ordinarily, i 

r will be quite small compared with unity and a small 0 HO! Pea a, 


number of terms of the series gives a satisfactory approxi- 
mation. The quantity r is related closely to the auto- 
correlation function of the noise and, in general, decreases 
with increasing 7’. 


If we substitute these values in (5), (11), and (12) we have 


= QING a al WwW 
M13 rT 2 > 


Yo = NW, Mis = O. (27) 


he 
: aes 
C= ON.W (28) 
2 . TW 
JSAP iti pasa aes (29) 


Now the frame frequency is 7’. Since the pulses are 
fined to a duration which is closer to 7/2, the signal 
action will have a strong harmonic component of 
quency 27" *. Since this signal amplitude modulates a 
rrier, the input bandwidth must be at least 47° if 
> important sidebands are to be included. If the pulses 
>» not well shaped a larger bandwidth might be necessary. 
hus the magnitude of r will usually be (2r)~* (=0.159) 
less. Calculations with other shapes of input spectra 
icate that r will usually be much less than this. 

When 7 = 0.16, the simple expression 


p = 1/2 exp (=) ©) 


an excellent approximation to the probability of error 
ra < 10. The approximation remains quite good even 
ir larger values of a. 

‘The probability, p, of error in a frame is plotted against 
e peak signal-to-noise ratio, a”, for r = 0 in Fig. 1. For 
'e range of values of a in this figure the curve for r = 0.16 
| practically indistinguishable from the curve shown. 
ne curve shows that a signal-to-noise ratio of a little 
‘ore than 10 db will give a satisfactory error rate. 

/It seems likely that if some form of post-detection 
itering or integration is introduced before the decision 
k to the position of the pulse is made, the probability of 
‘ror should be decreased. In principle, the error prob- 
pility with arbitrary post-detection filtering could be 
stained from the general results of Meyer and Middle- 
pn.” However, the problem of solving the associated 
tegral equations seems to be difficult except in the 
imple case which is treated more directly in the present 
Aper. 

Finally, it should be pointed out that p is the prob- 
oility of error in a single frame. In a five-digit code, 
he probability of error in a character, p,, is given by 

| 


Dea ticle rp) (30) 
hen p is small this becomes 
De = Op. (31) 


Error RATE With A FADING SIGNAL 


In this section we present a brief discussion of the 
ror rate of a system using pulse position coding when the 


3M. A. Meyer and D. Middleton, “On the distribution of signals 
d noise after rectification and filtering,’ J. Appl. Phys., vol. 25, 
. 1037-1052; August, 1954. 


7 Campbell: Error Rates in Pulse Position Coding OA\ 


io* 


PROBABILITY OF ERROR 


1o® 


-7 
C 5 10 15 
INPUT SIGNAL-TO-NOISE RATIO (db) 


Fig. 1—The probability of error as a function of input signal-to- 
noise ratio in a pulse position system. No carrier fading. 


signal amplitude is subject to fluctuations. It seems clear 
that a detection system of this type should be better than 
a system which uses threshold detection of single pulses 
when the signal is fading. 

Montgomery* has calculated the error rates for several 
other methods of modulation and binary coding when the 
signal is fading. Like Montgomery, we assume that the 
values of signal amplitude, P, follow a Rayleigh dis- 
tribution. Thus, we suppose that the probability density 
function of the signal amplitude, P, is 


2P —P? 

ree ie) 
where P, is the long term root-mean-square value of the 
signal amplitude. We shall also assume that, for given P, 


4G. F. Montgomery, “A comparison of amplitude and angle 
modulation for narrow-band communication of binary-coded mes- 
sages in fluctuation noise,’’ Proc. IRE., vol. 42, pp. 447-454; Febru- 
ary, 1954. 


IRE TRANSACTIONS ON 


the error probability with pulse position coding is 


1/2 exp (=F). 


Then the probability of error with a fading signal, pp, 1s 


given by 
Lik Po) Pep (=P 
p= fo 1/2. exp (=) # sep (= dP 
Pe See | (32) 
2 + Po/2o 


The error in py due to neglecting the remaining terms in 
the series (14) is easily shown to be less than 
Sr? ; 
(1 — r°)(4 + Po/2¥0) 
For r = 0.16, which is about the maximum possible value 
of r, this amounts to about 20 per cent of pr. 


CoMPARISON WITH OTHER SYSTEMS 


It is interesting to compare the results for pulse position 
coding, with and without a fading carrier, with the results 
obtained by Montgomery* for threshold detection of a 
binary coded message. In Montgomery’s calculation one 
binary digit is transmitted by transmitting or not trans- 
mitting a pulse. The pulse is used to amplitude modulate 
a carrier and is recovered in the receiver at the output of 
an envelope detector. Such a system requires only half 
the bandwidth of the pulse position code and a com- 
parison of the systems must take account of this fact. 

First, we consider the case of no carrier fading. The 
error rate in the pulse position system is, approximately, 


1/2 exp (— a’/2), (8) 
where a’ is the signal-to-noise power ratio at the peak of 
the pulse. The error rate for threshold detection as given 
by Montgomery is, approximately, 


") 


Par 1/2 exp (= 


+ 1/4 erf erf (33) 


where erf x is the error function, defined by 


Re woes 

Sie it Ce Vadis 

In (33) the parameter a’ is one half the signal-to-noise 
ratio at the peak of a pulse. The factor one half is inserted 
because a transmitter power which gives a signal-to-noise 
ratio of a’ in the pulse position system will give a signal- 
to-noise ratio of 2a” in the other system because of the 
different bandwidths required. Thus, assuming that each 
system uses the least possible bandwidth, equal values of 
ain (8) and (33) correspond to equal values of transmitter 


erf x 


| 
| 
INFORMATION THEORY March | 


power. It will be seen from (8) and (33) that the pulse” 
position system gives a slightly lower error rate. The 
difference is significant only at low signal-to-noise ratios. 
Moreover, as Montgomery points out, the assumptions 
used in deriving (33) lead to a value of pr which is too 
high at low signal-to-noise ratios. Hence there does not : 
appear to be any practical difference between the error: 
rates for the two systems. ; 

We now consider the case of carrier fading. The corre-- 
sponding error probability for the pulse position system is ; 


1 


D) ae yin 5) (32) 


prs 
where a? is the mean signal-to-noise power ratio at the 
pulse peak, P>/(2¥.). For the threshold detection system 
with envelope detection, the error probability, prr, 18 


Per = 1/21 = (34) ; 


2a 
(lea eure 


It is interesting to compare these probabilities with the 
error probability given by Montgomery* for a synchronous 
detector and a fading carrier. In this case, the system uses 
a synchronous detector with automatic gain control to. 
maintain the threshold-to-signal ratio at its optimum, 
value. The error probability for this system, prs, 1S; 


given by 
az 
= — o 
i ua Vao + zl 


The parameter a) has again been adjusted so that (32), - 
(34), and (35) may be compared directly. 

The error probabilities pr, prr, and prs are plotted ini 
Fig. 2, on the next page. It will be seen that p, falls be-; 
tween prr and prs for the whole useful range. For mean 
signal-to-noise ratios of more than 30 db (binary error’ 
probability less than 10°*) the error rate with the pulse: 
position system is less than half that with the fixed-: 
threshold, envelope-detection system, and is approxi- 
mately twice the error rate of the synchronous system. 

A similar comparison can be performed for the case, 
where synchronous detection is used in place of envelope; 
detection in the pulse position system. In this case, the, 
inputs at carrier peaks are compared at the two pulse. 
positions. As with the envelope detection system, the 
pulse is said to be in the position in which the input is 
greater. Without the operation of envelope detection the 
mathematics is much simpler. The calculations will not be 
reproduced here because they are very nearly the same as 
those performed by Montgomery for the synchronous 
detection of an amplitude modulated carrier. The result 
is that the two systems perform equally well, both with 
and without a fading carrier, That is, for the same error 
rate, an on-off amplitude modulation system, with 
synchronous detection and the optimum threshold setting, 
requires twice the signal-to-noise ratio and one half the 


(35). 
| 


Pre -PULSE POSITION, ENVELOPE 


DET ECTION 
P.,FIXED THRESHOLD,ENVELOPE DETECTION 
io" Sy Pes SYNCHRONOUS DETECTION 


roy 
oO 


PROBABILITY OF ERROR 


10 20 30 
INPUT SIGNAL-TO-—NOISE RATIO (db) 


Hig. 2—The probability of error in three systems as a function of 
average input signal-to-noise ratio in the pulse position system. 
| | Carrier fading. 


andwidth of the pulse position system with synchronous 
tection. These statements are exactly true only when the 
oise at the two pulse positions in a frame is uncorrelated. 
fowever, the correlation is usually small enough so that 
ne statements are very nearly accurate. It should also 
2 mentioned that the pulse position system does not 
squire good automatic gain control such as that required 
1 the threshold detection system. 

_ Figs. 1 and 2 can be used in conjunction with Mont- 
jomery’s curves to obtain a comparison of the pulse 
josition system described here with the other modulation 
ethods (frequency and phase modulation) examined 
hy Montgomery. It should be remembered, however, 
hat these modulation methods could also be used with 
yulse position coding in place of the amplitude modulation 
onsidered here. 


— 


APPENDIX 


In this appendix we shall outline the main steps in 


ase a a 36 
Pi / QW, (a ’ ) ( ) 
Ve 

= ee 9 
V 20 i 

2 2 \1/2 
ae (was JES : (15) 
Fi(p., p2) = 2YoF( V2 Pi) V 20 2) (37) 


Clearly R, > KR, if p, > p2, and hence 


te, 


W7 Campbell: Error Rates in Pulse Position Coding 23 


= ih dp, jig dps Fi(p1, P2) + (38) 


From (10), (13), and (37) 


Linen) = dt epee 


: ae +p; + a” — 2ap, cos 


oa Qu 
Wo 


_ 2th 
Wo 


The integration with respect to 6, is performed with the 
aid of the integral representation 


il oaks +7rc08s 
ineR =5,/ ert? dy 


where 6 has any value whatsoever and J,(x) is the modified 
Bessel function of the first kind and order zero. Then 


24 
ie —_ 2pipo _ = 
F,(p1, 2) Ee i. r(1 — 7?) exp 


+ a’ — 2ap; cos 0) [to CT nc 2) 


3 (os0 COs (05 oa | 6,) — Appz COS is) 


(ove sin (6. — 6,) — ap. sin a) | d6, d6,. (39) 
(40) 


re (p; oF po 


where 


2 
q=7 Pe (i pi + a” — 2ap, cos 6)”. 


(42) 


The integrand in (41) is now expanded in a series of 
ascending powers of r. The expansion is given by the 
following bilinear generating function® for the Laguerre 
polynomials: 


1/2 
(1 — 2)" exp [= 2H tH) = DV aye) "To E= | 
s: n! 
7 n=0 T(n kes te 1) 
The Laguerre polynomials, 
If we write 


Li@Lye” (lel <1). (48) 


L% (a), are defined by (17). 


= Iota (14) 
n=0 
and use (38), (41), and (43), we have 
ee cS) 2a 
=| de [ dof ao, 222 

(0) p1 0 T 
-exp [—(o: + p: + a” — 2ap, cos 4,)] 
-L,(ps)L,(pi + a” — 2ap; cos 6). (44) 


5 A, Erdélyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi, 
“Higher Transcendental Functions,” McGraw Hill Book Co., Inc., 
New York, N.Y., vol. 2; 1953. See eq. 10.12 (20). 


D4 


Now the integration with respect to p, can be performed 
directly. We obtain® 


2 Polin(paye °* dp = € ** [L,(pi) — L,-s(pi)]. (45) 


“Ppa 


The other Laguerre function is expanded as a power 


series in cos 6,, giving 
L,(p, + a — 2ap, cos 6) 


(2ap, cos Ce 


k=0 Ve 
Also, if (40) is differentiated / times, we have 
k 27 
IAG) ie a Z ee it e*°** cos* dy. (47) 
da 27 Jo 
Thus, from (46) and (47), 
74 g Prosi Ty (po, —- a — 2apr COs 9,) a0, 
2Qr J0 
n : b 9 ; k : 
= >) Enda’ + i) pi Ip? (2ap:). (48) 
k=0 Ue 


The combination of (44), (45), and (48) gives the ex- 
pression (16) for f,. 


6 Tbid., eq. 10.12 (29). 


IRE TRANSACTIONS ON INFORMATION THEORY 


We can also obtain a bound for f, from (44). An upper | 


. . . (Pont 
bound for the Laguerre polynomials is given’ by 


| Len. a eOn 
When this is substituted in (44), the result is 


| ii | < eo" i pie ** Io(aps) dp, 
0 


The last integral was obtained from Weber’s first ex- | 


. ° 8 
ponential integral. 


Finally, the coefficients f, can always be evaluated 


with the aid of Hankel’s formula: 
if I, (at) exp (—p' 2) dt 
(0) 


Td /2 + ul)(@/2p)" 
2p"T(v + 1) 


-F\(1/2 + ul; + 1; 0°/4p’), 


where ,/, is a confluent hypergeometric function and a@ | 


and p are arbitrary constants. 


7 Tbid., inequality 10.18 (8). 


8G. N. Watson, “Theory of Bessel Functions,” Cambridge | 


University Press, Cambridge, England, p. 393; 1948. 
9 Thid. 


The Part of Statistical Considerations in the 


Separation of a Signal Masked by a Noise’ 


JEAN A. VILLET 


Summary—tThe object of this paper is to demonstrate that the 
stochastic considerations presently involved in signal detections 
are purely descriptive and are not sufficiently developed to reach 
the proposed aim. 


INTRODUCTION 


HIS PAPER intends to show which processes are 

at presently available to attempt to separate a 

signal masked by a noise. These processes have to 

be defined theoretically in order to be realized. According 

to the literature, these processes appear to be very num- 
erous, but in fact, there is only a small number of them. 

The optimum filter for given structures of the noise 

and signal is satisfactorily defined, whether the adopted 

solution appeals to the theories of Wiener (reduction 

of the standard deviation), to Fisher (estimation of the 


* Manuscript received by the PGIT, June 1, 1956. 
+ Societe Alsacienne de Const. Mecaniques, Paris 13, France. 


maximum likelihood), to Neymann (adjustment of the 
probabilities of errors of first and second type), or to 
Bayes (determination of the probabilities a posterior?). 
All these theories may be checked practically by con- 
sidering an appropriate risk-function. 

For the realization, it is necessary to know: 1) The 
statistical definition of the noise and signal; 2) the 
analytical determination of the optimum-filtered signal; 
and 3) the physical realization of this analytical deter- 
mination. 

The most important work made up to this day concerns 
the second class of researches. 

We have attempted to show that it is the first and the 
third classes which actually determine the second one. 


THE FUNDAMENTAL OPERATIONS 


To emphasize the philosophy of separation of a signal 
masked by noise, we have to deal with two sets of signals: 


| 


| 


March 
(49) | 


Gs) | 


3) | 


|A set of pure signals 8), 82, -- 
frupted signals S,, S,, ---, S,,. 
The signals to be separated are the signals s,; the signals 
Hjour disposal are the signals S;; the operation of separa- 
in between signal and noise consists in a transformation 
sociating to each signal S; one, and only one, signal s;. 
4is Operation will be called filtering. 

‘ltering is a transformation with unique output; 
‘er reception of a corrupted signal, a regenerative 
jcuit can only give one signal, which is, rightly or 
ongly, treated as being the best pure signal. If the 
sering is done in several steps, or if the regenerated 
snal has to be retransmitted, it may not be necessary 
useful to ask for one signal; it may be asked that a 
mal S,; initiates several prefiltered intermediate signals 
1, Sfo, +++, a remixing of which can be performed in 
final filtering, or which are the inputs of several 
nsmission lines, the outputs of which are mixed to 
e a final signal. We limit our considerations to a final 
itering, the output of which is an unique signal. 

The performance of a filter is bounded to a comparison 
»tween the filtered signal 


-, Sj and 2) a set of 


FS. (1) 


id the pure signal which originated S. The change of s 
»riginal signal) into S can be symbolized by an operator 
(transmission) which is generally a many-to-many 
‘lation 


ie, (2) 


' The filtermg problem is then, 7 being known, to find 


| so that FT, product of the two operators, shall be 
operator very like the unity (or neutral) operator: 


FT 1. (3) 


The object space of operator 7 is the space of pure 
ignals; its image space is the space of corrupted signals. 
he object space of F is the same as the image space of 7’. 


palize the operator /’. This space is a priori unknown, F 
jeing the object of the research. Were it possible to 
btain exactly the equality (3), it would be an absolute 
mposed condition that the image space of F’ should be 
he same as the object space of 77, 2.e., the space of pure 


The equality (3) being only an approximate one, it 
s not necessary to postulate the identity of the output 
}f with the pure signals. 


Tue Risk-FUNCTION 


If the regeneration is only approximate, we have to 
ce three categories of signals: 1) The pure signals s,; 
’) the corrupted signals S;; and 3) the regenerated signals 
+, and two transformations: The transformation T’ of 
ransmission; and the transformation F’ of regeneration, 
ind a “norm,” 7.e., an evaluation of the damage con- 


D7 Ville: Separation of a Signal Masked by a Noise 25 


sequent to a false interpretation of S;. This norm can be 
a functional of two signals s; and o;: 


eee = es apa 


In all these definitions, nothing is postulated about 
linearity of 7, F, and about the stochastic characteristics 
of the correspondence T between pure and corrupted 
signals. 

The correspondence 7’ may become a one-to-one corre- 
spondence; the regeneration can be impossible because this 
one-to-one correspondence is impossible to reverse 
physically. This is the case when T is a transmission on 
a delay line without distortion. The best we can do is to 
take 


o(j== 


6 being the delay time of the line. In general, it is supposed 
that a pure delay causes no damage at all, so that 


Ris(t), s(¢ — 6)] = 0. 


sé — 6) 


When 7 is a many-to-many correspondence, the 
theoretical limit to an ideal regeneration is not the fact 
that one s; generates several S,;, but the fact that one 
S; is generated by several s;. Concerning the o,, we may 
suppose that F/ is a many-to-one correspondence. This 
is not in fact the case, any regenerative device having its 
own noise, so that to one input signal are associated 
several output signals; but we can look at this local 
jiltering noise as a secondary phenomenon. 

The correspondence between the S; and the oa, being 
a many-to-one correspondence, the choice of F, if the 
space of the o, is given, is only a matter of dividing the 
space of S; in disjoint subspaces, and assigning to every 
one of these subspaces a certain a». 


Wuy SrocHastic CoNSIDERATIONS ENTER 


Being given the s,, the S;, the o,, 7, and the matrix 
[R;,], if we are absolutely free in the choice of /, we have 
to minimize a certain quantity, depending on /. This 
quantity concerns the complete set of s, and S,; for a 
precise definition of it, we need further conventions. 
There, for the first time, stochastic considerations appear. 
If we consider all the binary relations included in the 
correspondence 7’, we find in general that, for every 
choice of F, there exists for a given s; a possible o, for 
which the damage R;, is very severe. [or continuous 
processes it may appear, more drastically, that for any 
s any o is possible; in this situation, any regeneration is 
impossible. If we look at the situation from a probabilistic 
point of view, we have at our disposal: 1) A weighting of 
the s,; and 2) a weighting of the binary relations between 
the s; and the S;, which, considered as a whole, frame the 
transformation 7’. 

With these weightings, it is possible, for every many- 
to-one correspondence F’, to define an average value of 
R,,. If we call p; and p;; the weightings 1) and 2), and if 
h(j) is the rank of the o, associated to S;, the average 
value of the R;; 1s: 


26 IRE TRANSACTIONS ON 
Rij a ») Pi p; Rt, h(y)] 


and the ‘‘best’’ F is the F or one of the several F’s which 
minimize R. The transformations F’ being defined when 
the function h(j) is itself defined, the minimization of 
R is a pure algebraic problem. The difficulties are: 1) The 
choice of the p,;; 2) the choice of the p;;; 3) the choice of 
the o,; 4) the choice of the ‘function of risk” R;,; and 
5) the realization of the correspondence h(j). 


PuysicaL Limits ror f 


Of the difficulties 1) to 5), the main one is the last. 
Not all correspondences between S; and o, are physically 
realizable. There appears the fundamental intervention 
of physical time. If the pure signals s; are emitted in the 
interval 0, 7, if the corrupted signals S,; are received in 
the interval (7, 7), and if the regenerated signals o, 
are emitted in the interval (74, 7';), the filtering device, 
in the general case, can work only if T, > 13; 1.e., the 
device has to store the whole signal S; before being able 
to emit the signal o,,. 

If the method of separation has to be extended to 
indefinite signals, the storage of the whole signal S; 
before emitting o, is unthinkable; we must restrict the 
field of realizable F’s to the transformations which, to 
emit the signal o,(@) (considered as a function of time), 
store only the values of S;(¢) for ¢ < 6. If the signal o(@) 
for ¢ < 6 is compared with the signal s(¢) fort < @ — 7,7 
is the time allowed for transmission and for the working 
of the filtermg device. It appears that, the longer 7 is, 
the less is the restriction imposed on F. 

If the signals are pulsed signals, the physical time 
restriction is not drastic, theoretically, so long as a delay 
in the filtermg is not damaging. But very strong re- 
strictions are imposed by the technical impossibility of 
including a long storage in a physical filter. The only 
storage which is reasonably possible is a storage of energy, 
or any other characteristic available by integration. 

In some cases, a conventional filter can provide the 
best solution. For instance, if s(¢) is a filtered Gaussian 
white noise, and if S(é) is a sum: 


S@) = st) + b(t) 


when 6(é) is another filtered Gaussian white noise, it is 
possible to show that a linear filtering can minimize R, 
if we measure R as a quadratic error. The results agree 
with the classical notions of signal/noise ratio. The 
linearity of the filtering results from the fact that the 
search for a minimum concerns quadratic forms, leading 
to linear equations; the presence of the quadratic forms 
is implied by the Gaussian expression of the probability 
laws. On Fig. 1, we see that the filter ” operates on a mix- 
ture of outputs of two linear filters; it is itself a linear filter. 
The minimization of the conventional output of R is more 
and more perfect when the delay of the line between the 
outputs of F, (pure signal) and F (regenerated signal) be- 
comes longer. 


INFORMATION THEORY 


| 


Marei' 


When the signal, or the noise, are not Gaussian, s¢/ 


that /, or F. are not linear filters in our conventiona’ 
diagram, /’ is itself no more linear. Suppose for instance} 
that b being Gaussian, s(¢) is a filtered series of pulses 
emitted with a time-rule following the Poisson’s Law} 
To extract these pulses from white noise, we have te 
and to introduce some inl 


set some thr esholds, 


Fig. 1—The outputs of two white noise generators are filterec| 
(by nese filters), mixed, and separated by linear filters. W. N. G. =! 
white noise generators; L. F. = linear filer. 


StTocHAstTic DESCRIPTIONS 


Let us return to the different choices 1) to 4). The first 
two choices are a theoretical matter, interesting to the; 
theorists of random functions. The two choices being of| 
the same nature, let us examine the first. To randomize} 
a category of signals is not easy work, because, apart 
from the filtered Gaussian white noise, the signals are not 
quite defined by their correlation functions of the first 
orders, in the same manner that a distribution function 
(density of probability), when multimodal, is not quite 
defined by its first moments. 

For instance, a process of Poisson, with similar pulses 
appearing at random intervals of time, is not clearly’ 
described by considerations of correlation and spectrum. 
The proof of this inadequacy is simply that if a signaling 
series of pulses is mixed with a noisy series of pulses of 
double height, the two series having the same law of 
probability (and being independant of each other), it is 
very easy to filter by a detection of the amplitudes of 
peaks. That seems paradoxical, since the two series have 
the same spectrum, the energy of the noisy one being 
four times the energy of the signaling series. 


THE CoRRELATION DETECTOR 


The first aim of a correlation detector is to decide if, 
in a corrupted signal, a sinusoidal waveform is present. 

If the frequency of the hypothetical waveform were 
known, it would be sufficient to use a classical band-pass, 
the transmission band of which is centered on the said 
frequency fo. The cutoff frequencies of the filter should 
be separated by a Af approximately equal to the reciprocal 
of the duration of the test. The indicial response of such 
a filter is a sinusoidal waveform, modulated by a low 
frequency signal, the duration of which is approximately 
the duration of the signal to be tested. The output of the 
filter is a modulated sinusoidal waveform, and the 
probability of a sinusoidal component in the tested signal 
is related to the energy of the output. Such a filter is very 


57 


ficult to design, because it has to store energy during 
long time, and this is possible only with a great number 
| lumped elements. The correlation method is a by-pass 
nich avoids this difficulty; as we are not interested in 
koenerating the waveform itself, but in detecting its 
esence or absence, we have to integrate the energy of 
je output of the filter. It appears that the combination 
the two operations of filtering and integrating can be 
me more economically if we substitute for them equiv- 
ent operations, for instance two modulations and two 
nadratic integrations. It is easy to show that this new 
}robination is equivalent to the computation of an 
ement of the spectrum of the correlation function; nar- 
‘wing of Af is automatically realized by the broadening of 
pservation time, at the end of which the total filtered 
nergy is stored to be compared to a reference level.’ 
When the frequency of the sinusoidal waveform is 
nknown, the elementary solution of the problem is 
early to apply the corrupted signal at the input of a 
pllection of band-pass filters with adjacent transmission 
ands, and to detect the largest output. 

Of course, the comparison can be made only if the 
ransmission bands are related to the density spectrum of 
ne supposed noise. The filtering must be as sharp as the 
uration of the signal allows. For long durations, we 
nall have an unthinkable number of filters. The correla- 
on method permits avoiding this difficulty, but without 
xhausting the information. This would require that the 
orrelation function should be analyzed in its Fourier 
pmponents, which is the same problem as that first 
roposed. However, one step has been made; instead of 
he spectrum of the corrupted signal, we have to deal 
‘ith its square, so that the peaks are more apparent.” 


Tue Limitine or Noise PEAKS—PECULIAR CASE OF 
| CoMPRESSION-EXPANSION 


It can be shown that a peak limiting can improve the 
Henal/noise ratio. This can advantageously happen 
i where: 1) The noise peaks are significantly high; and 


The signal and the noise are not both Gaussian. 
When the noise and the signal are almost Gaussian, 
limitation of peaks, or some other nonlinear detection, 


1. Reich and P. Swerling, ““‘The detection of a sine wave in 
Haussian noise,” J. Appl. Phys., vol. 24, pp. 289-296; March, 1953. 
2M. Horowitz and A. A. Johnson, ‘“‘Theory of noise in a corre- 
ition detector,’ IRE Trans., vol. IT-1, pp. 3-5; December, 1955. 
3M. D. Indjoudjian, “Le filtrage et la prediction des messages 
plon Norbert Wiener,” (‘La Cybernétique, Théorie du Signal et 
ie ’Information”’), Rev. d’Optique, (Paris), pp. 35-53; 1951. 


Ville: Separation of a Signal Masked by a Noise 


27 


does not improve the classical results very much (with 
linear filtering). 

When the noise, being not Gaussian, can be defined by 
an instantaneous transformation of a Gaussian noise, 
we can, by using an expansion or compression of ampli- 
tudes, put ourselves in conditions where the linear filtering 
is adequate; the transformation we apply to the corrupted 
signal does not affect the results obtainable by the method 
of maximum likelihood, which are invariable under 
every one-to-one change of variables. 


Tue Maximum LiIkeLinoop Mreruop 


The maximum likelihood method, consisting in con- 
sidering as the best regenerated signal the signal for 
which the probability density function of the corrupted 
signal is the greatest, is very easy to explain theoretically. 
It is more difficult to compute the error made in the 
application, because the classical methods of computation 
give the error only in asymptotic form, 7.e., for a great 
number of observations. 

Dealing with the technical realization, we have first 
to describe the signal with a certain number of parameters, 
to find (mathematically) the expression of the estimation 
of these parameters as a function of the observations 
made on the corrupted signal, and then to form a device 
calculating these estimated values as a function of the 
observations. 

Of course, the regeneration of the signal by a generator 
having these estimated parameters as inputs is generally 
omitted as useless, as is the case for instance when we 
deal with target detection. But the computation (of 
course automatic) of the estimated parameters is very 
intricate when the distribution of noise and description 
of signal are not simple. 


CONCLUSION 


If we refer to the three divisions of research which are 
pointed out in the introduction, it is to be noted that the 
stochastic definition of noise and signal is most important 
in a filtering design for a particular aim, and all energies 
must be devoted to this task. 

At the present time, these definitions exist but in such 
terms that they do not correspond to a single physical 
realization, except in the case of a filtered white Gaussian 
noise, so that the present practical methods lead to 
practical results only for linear filters and compressor- 
expandor arrangements or to theoretically equivalent 
arrangements, such as those using correlation methods. 


28 


On a Cross-Correlation Property for 
Stationary Random Processes 


JOHN L. BROWN, JR.t 


Summary—Given two stationary random processes x(t) and 
x,(t), the cross-correlation property of interest is the following: If 
one of the two processes is distorted by an instantaneous nonlinear 
device, then the cross correlation after the distortion is proportional 
to the cross-correlation function prior to the distortion. 

Using an expansion of the second-order joint probability distri- 
bution p(x1, x.) introduced by Barrett and Lampard, a necessary 
and sufficient condition for the above cross-correlation property 
is given in terms of requirements on the expansion coefficients. 

In certain cases, the constant of proportionality involved in 
the cross-correlation property is equal to the ‘‘equivalent gain” of 
the nonlinear device as defined by Booton. A necessary and suf- 
ficient condition for these two constants to be identical is formu- 
lated in terms of the expansion coefficients of p(x1, x.). The class 
of distributions satisfying this condition is a subclass of the set of 
distributions for which the cross-correlation property is valid. 


INTRODUCTION 


HE cross-correlation property to be studied in 
| this paper may be stated as follows: A: If 2,(¢) 

and a(t) are a pair of stationary time series, 
one of which undergoes amplitude distortion in a fixed 
‘Snstantaneous”’ nonlinear device, then the cross-correla- 
tion function after the distortion is proportional to the 
cross-correlation function before the distortion. The 
constant of proportionality depends on the particular 
nonlinear device considered, but is independent of time 
in the stationary case. 

It was first shown by Bussgang’ that property A holds 
when the joint probability distribution of 2,(¢) and 
x(t), p(®,, X2), is Gaussian. In attempting to generalize 
Bussgang’s result to a wider class of distributions, Barrett 
and Lampard” expanded the second-order distribution 
p(X, %) m a double series of orthonormal polynomials, 
the particular polynomials being determined by the first- 
order distributions, p,(x,) and p2(a,). When the coefficient 
matrix of this expansion is a diagonal matrix, they 
demonstrated that property A is valid even in the more 
general case when x, and x, are nonstationary. 

The class, A, of second-order distributions having a 
diagonal coefficient matrix with respect to the expansion 
includes the Gaussian distribution (thus giving Bussgang’s 
result as a special case) and several other distributions of 
practical importance.* 

In this paper, we shall make use of the Barrett-Lampard 
expansion for p(a,, v) and derive a necessary and suf- 


* Manuscript received by the PGIT, August 7, 1956. 

+ Ordnance Res. Lab., Pennsylvania State University, Pa. 

1J. J. Bussgang, “Crosscorrelation Functions of Amplitude- 
Distorted Gaussian Signals,’ Mass. Inst. Tech., Res. Lab. Elec- 
tronics, Cambridge, Mass., Tech. Rep. No. 216; March 26, 1952. 

2J. F. Barrett and D. G. Lampard, ‘“‘An expansion for some 
second-order probability distributions and its application to noise 
problems,” IRE Trans., vol. IT-1, pp. 10-15; March, 1955. 

3 Thid. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Mare 


ficient condition for property A in terms of the expansior} 
coefficients. We shall also discuss the relation between the) 
constant of proportionality involved in the cross-correla:} 
tion property and Booton’s “equivalent gain’* for the 
nonlinear device under consideration, again deriving ¢| 
necessary and sufficient condition for the two quantities| 
to be identical. 


GENERAL PROPERTIES OF THE EXPANSION 


We let p(x, v2) denote the second-order joint prob:, 
ability distribution’ of vz, and x, noting that this dis 
tribution depends on r = ¢, — #¢, when 2, and 2, are 
stationary processes. Two sets of orthonormal polynomial 
{a (x,)} and {6% (z,)} are constructed having weighting 
functions p,(2,) and p.(2.) respectively, where p,(a)| 
and p,(x,) are the corresponding first-order probability 
distributions. These first-order distributions are given by| 


pil) = [ ve, =) dx, 


P2(x2) i P(x, Lo) wes 


The expansion of p(x,, %,) in terms of the polynomials 
has the form 


p(x1, G5) = Play) Pas) Pe 2d Onn T) On (&y) 6,” (te) (2) 
where ; 
Amn 7) = Omn = I p(ar, a) 0, (ary) 6, (Gs) da, dxs (8) 
and the orthonormality conditions are’ 
/ px (x1) 0, (a1) 6,” (21) dx, = Neva 
(4) 
[ ale) (0) 02) ta = 3 
J 


Since p(%,, %), ~1(%), and p.(%,) are probability dis- 
tributions, the following relations obtain 


4R. C. Booton, Jr., “The analysis of nonlinear control system: 
with random inputs,” Proc. Symposium on Nonlinear Circuit Analysis 
Bklyn. Polytechnic Inst., pp. 369-391; April 23-24, 1953. 

®>That is, p(%i1, 2)dx, dx gives the joint probability of th 
inequalities 71 < 2i(t1) < 21 + dx, and x2 < 2%» (to) <4. + dae 
This function is commonly referred to as the “joint frequency” 0 
“Joint density” function in the literature. 

6 All integrals are to be understood as extending over the whole 
range of variation of the random variables involved. 


7 6mn is the usual Kronecker delta defined by 8nn = ‘0 4 He x 


I D1, Ze) dx, ds = 1 
foe@)14 dx, =S | sees). de, = 1 


J oere, — u,)-1 dx, = 0 
| (5) 


| 
=) 


| psle\@x = w)-1 dee = 
[ ae) (2 = as) ie a us) di, = 1 
f rten (@*) (S 


lt Soa of x, (¢), and ps, o, are the corresponding quanti- 
ts for 22(?). 
idsese relations imply 


65°’ (a1) = 


bY — 
6: (a1) cu Dia! o 


(6) 


Lo = lu 
gC ) 2 2 
02 J 


|!Now consider ([6,.) (2) + 6,” (v2)]”).* 


'This average is equal to 


hk 


| p(at1, X2)[6.>(e,) + 0%” (x2)? dx, dae, 


Nhbich reduces to the quantity 


ih + 2\Omn s xe 


sing expansion (2) for p(a,, 22) and the orthonormal 
lations (4). Since this quantity must be non-negative 
ir all real values of X, it is easily verified that 


| Gnn | S 1 (7) 


In particular, a7, < 1, as established by Barrett and 


mpard in a similar manner for the special case in which 
mn} 18 a diagonal matrix. 


for all m,n. 


ptween x,(¢) and 2.(t) by 
ho(7) = / (a1 ea M2) P(r, Lo) Ax, dx, 


(8) 


nen, by using the expansion for p(z,, x.) and the ortho- 
lormal properties of the polynomials, it can be shown that 


et) ae Us) ) = Yio(r) 


0102 0109 


= ((a, — w1)(2 — Bs)); 


a(t) = (9) 
Another important property of the coefficients a,,, 1 


8 ««¢ ? denotes ensemble averaging. 


p? Brown: On a Cross-Correlation Property for Stationary Random Processes 29 


(2) is that a0 = don = 0 for n # 0. This may be shown 
as follows 


pales) =f pler, 2) de, 


foo} 


DS Anns? (a1) 0 (a9) dary 


n=0 


l| 


foo} 
m=0 


i: Pi(X,)po(x2) 


= dX Gon 6,” (a2) po(ate) 


upon interchanging the operations and using the ortho- 
normality relations (4) to simplify the expression. For 
p2(x-) A 0, we conclude that ° 


>» Bon Gs) = 1. 
n=0 


But, this equality indicates that the quantity on the left 
is the expansion of the function which is identically equal 
to “one” over the range of variation of x. Since 0°} (v,) = 1 
from (6), and since the expansion is assumed to exist and 
to be unique, 


Go = 1 and “dy, = Oto 1 0: (10) 


A similar argument shows a,. = 0 for n > 0. 


ANALYSIS’ 


The cross-correlation property A is expressed math- 
ematically by 


A: Wio(7) = K(f) Vie(7), (11) 
where 
Vi2(7) = I [f (a) oa (f (a1) Ge Mz) P(%1, V2) dx, dae, 
(12) 
and 
Viet) = / (Gl ae M2) p(X1, X2) Ax, Ars. (13) 


K(f) is a linear functional depending on the function 
f(x), which characterizes the nonlinear device and is 
independent of r. 

Actually, the term (f(v,)) in the integrand of (12) may 
be taken as zero, since this term contributes nothing to 
the integral as seen from the following argument: 


[f Ged) Ga = mole, 22) dy de, 


(f(a) ) / iE (te — te) prr, Zs) dz, dag 


I 


(f(a) ) / Pol») (a2 — py) dary 
(f(ai)) (42 — me) = 0. 


9 The arguments and manipulations used in this paper are purely 
formal. No questions of convergence or legitimacy of interchanging 
operations are considered here. It is assumed that the class of 
functions treated is suitably restricted so that the required ex- 
pansions exist. 


30 IRE-TRANSACTIONS ON INFORMATION THEORY 


Thus, V,.(7) may be written equivalently as 


Wilt) = I f(e:)@ta — pa)pri, @) dx; das. (12a) 
Theorem 1 

For stationary processes, V,.(r) = K(f)¥.2(7), if and 
only if there exists a sequence of real constants {dn}, 
m = 1, 2, --- independent. of 7, with d, = 1 such that 


nt (T) = nly Fy LORE ere re 


Proof 
We first show that the stated condition implies property 
A. Let : 


f (a1) = a0), 


with 

a = f fe) 6 @)r@) arn. 
Then, 

(fa) = f fladpln) de = ov 
Thus, 

fle) — (ile) = Daa) 
and 


he) = ff bs 0.80%) | Ce =) 


ane. (ap aes | “pi (Ei) aXe) Oty Ade. 


n=0 
But 
Lipy — iby Ga0; (oa) 


Using relations (4) and interchanging operations, the 
expression for V,,.(7) reduces to 


Wi.(7) = a2 D Axi (T)Ce- (14) 
By assumption, a, = d,a,, with d, = 1. Substituting in 
(14) 
Wott) = E = de. ii(7). 
From (9) 
val) = (SA) val. (1s) 


Thus, defining K(f) as oe ditz/or,° (15) becomes 
W,.(7) aa K(f)r2(7) 


as required. 


10 This definition involves only the coefficients of f (z,) and the 
{d,.}, which are assumed to be known. 


Marel 


Conversely, assume that for each f(x), a real numbe) 
K(f) is given such that property A holds. Substituting 
the expansion for p(z;, 2) in the integrands of both 


WV,.(r) and y,(r) and reducing, results in the expression 


foo} 


NE, Ags (TC, == K(f)au(7or- 


k=1 


at 


is assumed to be defined for all f(x), le 
hn. The {c,} corresponding to f(x) =| 
5m. For this choice of f(x), (16) gives | 


Since K(f) 
K[@, (a)] = 
0.» (a) are ¢, 


I 


Qiy = hance Ory ele eee 
Note in the above that h, = 1/o,. Defining d,, = hnoy 


we have from (17), 


Oni = d,0,, for” m= 1, 255-9 with “d,4=ee 


as required. 

In the case where p(2,, 22) has a diagonal matrix| 
Oni = 0 for m # 1, and, consequently, we may choose th«| 
trivial sequence {d, = 1, d, = 0 for n > 1} to show thai| 
the cross-correlation property holds. This special resul 
for diagonal matrices was previously established by 
Barrett and Lampard in their paper. 


RELATION TO Booton’s ““HQUIVALENT GAIN’’ 


In his analysis’* Booton approximates the output 
f(x,), of an instantaneous nonlinear device by 


fi) = K(f) — m) 


where z,(¢) is the input, and K,(f) is a constant to be 
determined. The criterion used by Booton to fix K,(f) 
is that the mean square difference between K,(f)- (x, — my), 
and f(2,) be made a minimum; that is, 


[ @) = KC). = mPa) de 


is to be minimized. Expanding and setting the derivative 
with respect to K, equal to zero gives the formula 


Kf) = ff) 2" ple) de,. 


: 
3 (19) 
K,(f) is then termed the “equivalent gain” of the non- 
linear device. It is interesting to note that this is the same 
as the expression for the K(f) appearing in the 
cross-correlation property as found by Barrett and 
Lampard.’*'* Thus, for distributions having a diagonal 
coefficient matrix {@,,,}, the constant of proportionality 
in the cross-correlation property is identical to the 
“equivalent gain”’ of the nonlinear device being considered 


11 Booton, loc. cit. 

2 Barrett and Lampard, loc. cit. 

In (34) A of the Barrett-Lampard paper, the factor p2(xe 
has been omitted from the integrand in the representation of th 
proportionality constant. 


fis natural to ask if this identity can be extended to a 
jder class of distributions; the following theorem gives 
answer in terms of the expansion coefficients a,,,.. We 
st note that (19) can be written 


- c 

K(f) ==, (20) 
~ ue 

; nere Cc, = ff(x)O{” (x,)p,(a,)dx, as before. Since this 

plways the representation for K,(f), the question reduces 

}a matter of when V,,(7) is equal to ¢,/o, Wj2(7). 


veorem 2 
Wor) = Aay42(7) ut and only if a,4(7) = 0 for m > 2. 


oof 


| As outlined in the preceding theorem, 


co 


Wi2(7) Sy = Aya (7) Ce 


k=1 


Vi2(7) = 710204; (7). 


herefore, if W..(r) = Kepro(7) = ¢1/o1 Yra(7), we have 


com) 3 AgiC, = C0201, for all Wc oun 
k=1 


(21) 


Ds Cailnca—=. 0) forall: f(a): 
; k=2 
|The particular choice of f(x,) = 0S” (a,) yields ¢, = 


ih and implies 


Ce = OM Or MoD: 


hus establishing the necessity of the condition. 
| Conversely, if a,.,(7), = 0 for m > 2, then 


W,2(7) = 62 yD. x1 (7) Ce = G01;(7)Cy. 
pal 


Yast) 


0102 


GT 


W.(7) = = ve?) 


KyWi2(7), as required. 


If we define A* to be the class of distributions p(x,, 2X2) 
br which a,,, = 0 (m > 2), then the class of distributions 
rith diagonal matrices A‘* is included in A*. Whenever 
(x,, 2) belongs to A*, the constant of proportionality in 
e cross-correlation property is equal to the “equivalent 


14 Thid. 


Ip? Brown: On a Cross-Correlation Property for Stationary Random Processes | 31 


gain” for the nonlinear device; further, p(x,, 22) must 
belong to A* for this to be true. 


GENERALIZATION 


The restriction to stationary processes in the above 
arguments is purely a matter of notational convenience, 
since only ensemble averages are involved, and time is 
regarded essentially as a parameter. In the event that 
x,(¢) and «#,(¢) are nonstationary, a,,, depends on two 
parameters, ¢, and ¢,, and both W,, and y,, also depend 
on tf, and f,. The polynomials 6.” (x,) and 6° (x2) will, in 
general, depend on ¢, and ¢, respectively, and, consequently, 
the expansion coefficients of f(2,) will be functions of ¢,. 
K(f) will no longer be independent of time, but will 
likewise depend on the parameter, f,; however, the con- 
stants d,, of theorem 1 are still required to be real con- 
stants exhibiting no time dependence. 


CONCLUSION 


The correlation property treated in this paper states 
that the cross correlation between two random processes, 
one of which has undergone an instantaneous nonlinear 
distortion, is proportional to the cross correlation before 
distortion. A necessary and sufficient condition for this 
property to hold has been given in terms of the coefficient 
matrix obtained when, p(2,, v2; 7) is expanded in a double 
series of orthonormal polynomials determined by the first- 
order probability distributions of 2,(¢) and z,(¢). This 
condition is essentially equivalent to requiring that the 
property hold for the infinite sequence of nonlinear 
devices, {F,}, where the response of F,, to an input z is 
6” (x). Thus, if the property holds for this sequence of 
“orthonormal nonlinear devices,” it will hold for an 
arbitrary instantaneous nonlinear device and conversely. 

We have also determined under what conditions the 
constant of proportionality involved in the correlation 
property is equal to Booton’s ‘“‘equivalent gain” for the 
nonlinear device. The condition stated shows that when- 
ever the two constants are equal, the cross-correlation 
property also holds, although the converse is not 
necessarily true. 

For Gaussian processes, Booton’s analysis shows that 
the difference f(x,) — K,2, is uncorrelated with x,; that 
is, the output of the nonlinear device can be represented 
as the sum of two terms, where one of the terms is the 
result of a linear operation on the input, and the other is 
uncorrelated with the input. If the correlation property 
holds, then this type of representation for the output of 
the nonlinear device is always possible. Thus, in any 
case where the random process considered is such that 
p(x;, V2) belongs to A*, Booton’s approximation of f(x) 
by K,x, results in an error term which is uncorrelated 
with the input, z,. Since the second-order distribution 
of a constant amplitude sine wave belongs to A* as 
shown by Barrett and Lampard, the linearization effected 
by the “describing function” is also of the type leading 
to an error which is uncorrelated with the sine wave 
input. 


ee) 
bo 


A Systematic Approach to a Class of Problems in the 
Theory of Noise and Other Random Phenomena 


—Part 


D. A. DARLING} AND A. J. F. SIEGERT? | 


Summary—The problem of finding the probability of distribution 
of the functional 


t 
@(X(r), 7) dr, 

to 
where X(r) is a (multidimensional) Markoff process and #(X, 71) 
is a given function, appears in many forms in the theory of noise 
and other random phenomena. We have shown that a certain 
function from which this probability distribution can be obtained 
is the unique solution of two integral equations. We also developed 
a perturbation formalism which relates the solutions of the integral 
equations belonging to two different functions ®(X, 7). If the 
transition probability density for X(7) is the principal solution of 
two partial differential equations of the Fokker-Planck-Kolmo- 
goroff type, the principal solution of two similar differential equations 
is the solution of the integral equations. As an example, we 
calculated the probability distribution of the sample probability 
density for a stationary Markoff process. 


INTRODUCTION 


N THE theory of noise and similar random phenomena, 
| a small number of special problems have been solved 
by various special methods. Each of these methods 
seems to apply only to the particular problem for which 
it was developed or at best to a rather restricted class of 
problems. It seemed of interest, therefore, to develop a 
systematic approach to a wider class of problems, which 
contains as special cases most of the problems solved 
before. Even though this approach leads to rather formi- 
dable integral or differential equations, so that the number 
of new problems which can be solved exactly will be small, 
it leads to a perturbation formalism for problems “‘in the 
neighborhood” of those permitting exact solutions. 
We consider the problem of finding the probability 
distribution of the random variable 


(1) 


t 
Uu= / @(X(7), 7) dr 
where X(r) is a Markoff process with components 2,(r), 
X2(r) x,(7). This problem arose originally as the 
problem of finding the probability distribution of the 
noise output of a radio receiver consisting of a linear 
amplifier, an arbitrary detector, and a second linear 
amplifier. Let x(7) be the output voltage of the first 


* Manuscript received by the PGIT, August 13, 1956. Most of 
this work was done while the authors were consultants for The 
RAND Corp., Santa Monica, Calif. 

+ Dept. of Mathematics, University of Michigan, Ann Arbor, 
Mich. 

t Dept. of Physics, Northwestern University, Evanston, II. 


IRE TRANSACTIONS ON INFORMATION THEORY 


amplifier at a time r > 0 before observation,’ ¢[x(r)] the 
output voltage of the detector at the same time, and K(r) 
the output of the second amplifier at the time of obser: 
vation if a 6-function pulse is applied to it at the time 7, 
The output voltage V of the second amplifier in response 
to x(r) is then 


ae | ROR GE (2 


if the noise was turned on at time f > 0 before observation, 
If the input of the first amplifier is white noise, x(7) is a 
Gaussian random function. This fact has made it possible 
to reduce the problem for the special case g(x) = 2° te 
the solution of an integral equation in one variable only. 
Except for this special form of the detector function, and 
of course in the trivial case g(z) = 2, the Gaussian 
property of X(r) does not simplify the problem. | 

If the first amplifier is equivalent to a network with 
lumped circuit elements and its input is white noise, 
x(r) is also a component of a Markoff process. This led 
us to consider the more general problem stated above 
which has many applications apart from the noise output 
of radio receivers. | 

If, for instance, a domain © is chosen in X space, and 
@(X, 7) is defined by 

eee 1 when Xisin® 
0 otherwise 


then wu is that part of the time (¢ — ¢)) during which X 
is in © in the time interval (¢ — f,), and (¢ — t&) ‘wis an 
estimate for the probability that X is in \, obtained from 
the finite sample. The distribution of wu is thus of import- 
ance if it is desired to estimate the accuracy with which 
the probability distribution of time homogeneous processes 
X(r) can be obtained from finite samples. 
If, specially, © is defined by 


vy << a, 


Witl sey uals x, unrestricted, the probability for 
u = t — t, 1s the probability that z,(’) < @ for all fam 
tp < t' < t (except for a set of measure zero) and, for a 
continuous function 2,(7), this is the cumulative dis- 
tribution of the absolute maximum of 2,(¢) in the interval 


It is convenient to choose the time scale positive into the past. 


\¢), if it is considered as function of a, or the cumulative 
pbability distribution of the one-sided first-passage 
he, if it is considered as function of f. 

Wf © is defined by 


Cee 


th x, @3, + x, unrestricted, one obtains in a similar 
for continuous #,(7) the distribution of the two-sided 
‘t-passage time (escape time), and the distribution of 
auc == Max. v(7) — min, @(r), t < 7 < %, For 
I>-dimensional Markoff processes the problem of first- 
ssage time, range, and maximum has been solved by an 
ier method.” 

(The problem of finding the distribution of the empirical 
sctrum or the Fourier coefficients obtained from a 
nple can also be formulated as a special case of our 
bblem. If ¥(7), Yo(7), ++: w(r) are given functions, 
+ instance trigonometric functions, the characteristic 


ietion 
(exp ‘i 2D Ce | @1(7) Wi(7) a} ; 


- the joint distribution of the Fourier coefficients is also 
» characteristic function 


(exp {is ic @(x,(7), 7) i:\) : 


: the random variable wu defined by (1), if one chooses 


&(X(2), 1) = E'x,(0) = Cia. (3) 


ITEGRAL HQUATIONS FOR THE CONDITIONAL CHARACTER- 
|} istic FuncTion, PERTURBATION FORMULA, AND 

| 

| DIFFERENTIAL EQUATION 


[In the present paper we present a heuristic derivation® 
two integral equations for the function 


(exp = fh &(X(7), 7) ar} | 


X (to) = Xo, X(t) = x) -W(Xo, to | XxX, t) (4) 


tl 


0) fa) Xt A) 


ere (|),, denotes the average of the functional on the 
ft of the vertical bar under the condition written on its 
tht, and where p(Xo, fo) | X, #)dX is the probability that 
\(¢) is in the volume element dX at X, if X(é)) = Xo. The 
hrameter \ will be chosen positive real if is non-negative, 


2A. J. F. Siegert, “On the first passage time probability problem,” 
vys. Rev., vol. 81, pp. 617-623; February 15, 1951. 

| D. A. Darling and A. J. F. Siegert, “The first passage problem 
r a continuous Markoff process,” Ann. Math. Stat., vol. 24, pp. 
4-639; December, 1953, and work quoted there. 

|3 A rigorous derivation has been given by D. A. Darling and A. 
F. Siegert, ‘On the Distribution of Certain Functionals of Markoff 
rocesses,”” The RAND Corp., Rep. P-429; April, 1954. Appeared 
abbreviated form, Proc. Natl. Acad. Sci.; August, 1956. A short 
etch of the method was given by Siegert, ‘‘Passage of stationary 
ocesses through linear and non-linear devices,’ IRE Trans., vol. 
GIT-3, pp. 4-25; March, 1954. 


[7 Darling and Siegert: Problems in the Theory of Noise—Part I 


30 


and negative imaginary otherwise. The Fourier or Laplace 
transform of the probability density for the variable u 
defined by (1) is obviously r/p if initial and end con- 
ditions are imposed and frdX if only initial conditions 
are imposed, and {[W(Xo, to)rdXdXo if no conditions are 
imposed, where W(Xo, t)) is the probability density for 
X (to). 

Consider now X(r) as the path of a particle in X space. 
If, at first, b(X (7), 7) is assumed to be non-negative and 
d real and positive, then A&(X, 7)d7 can be interpreted as 
the probability of a “collision” at the point X in the time 
interval (7, 7 + dr). A “collision” is thereby understood 
to be an event which does not affect the path of the 
particle nor the probability of later collisions, but leaves 
a mark on the particle so that the number of collisions 
experienced by the particle can be counted. The functional 
exp [—A f;, ®(X(z), 7)dz7] is thus the probability that the 
particle suffers no collisions on a path X(r) which leads 
from X, to X; and r(Xo, ty | X, t; \) dX is the probability 
of finding the particle at time ¢ in the volume element dX 
at X without any marks, if it started at Xo at time th. 
An integral equation for 7(X oto | X, ¢;) is obtained by sub- 
tracting from p(Xo, f) | X, t) dX the probability that the 
particle reached some point X’ in the volume element 
dX’ at some time ?’ without collisions, suffered the first 
collision there in the time interval (¢’, ¢’ + dt’), and went 
on from there to X suffering an arbitrary and irrelevant 
number of collisions. One thus has the integral equation 


Xeni ED) Spe ee rf av [ ax’ 


(Xo, to. |X’, t’; NOX’, U)p(X", | Xt. — ) 


Repeating the same argument with the last collision one 
obtains 


ACM a oe cy = [ dt! / dX? 


‘D(Xo, to | 2. t') B(X’, Er Xs t’ | xe t; d). (6) 


A formal derivation which removes the restrictions 
A > 0, &(X(7), 7) => O is given in Appendix I. 

Since the integral equations will in general be difficult 
to solve in closed form, a perturbation formalism seems 
of value. Suppose that the solution of the integral equa- 
tions is 7:(Xo, t | X, & A) for B(X, t) = (CX, 2). Let 
now a second scattering medium be added, such that the 
probability of a collision at X in (¢, ¢ + dt) is increased 
by A[&.(X, ft) — &,(X, A ]dt. By repetition of the argument 
given above one then obtains two integral equations for 
the solution 7,(Xo, é | X, #; A) of (5) and (6), when 
P(X, 1) =" b, (X52): 


POG ya incl Ben OG parva. Cpro’) 
t 

z rf dt’ / DOO ay Sa 
to 


ENC iie eC. w ONO GG sae G52) 9) 


3 IRE TRANSACTIONS ON INFORMATION THEORY 


ra(Xo, to |X BN = rN, BERLE 


—HK dt’ OX Xa. to | Xa t’) 


(8) 


We can thus obtain successive approximations for r, if 
r, is known, by the usual iteration procedure. 

A formal derivation independent of the restrictions 
imposed on \ and ® has been given.” This derivation 
serves, furthermore, to prove the uniqueness of the 
solutions of (5) and (6). 

In many cases of practical interest p(Xo, fo | X, £) is 
the principal solution of two partial differential equations 
of the form 


155, 0) — O(XOU CX | Xe 


Op 

ae Lp (9) 
Op _ 7+ 

pet Lip (10) 


where L is defined by* 


‘eel om 
oes 2 » Ox,02, 


kl 


BulX, Op] — D a [A,(X, dp] (11) 


and L* is the adjoint of this operator with X and ¢ replaced 
by X, and f). The physical meaning of (10) is that of a 
continuity equation 


(12) 


where Div is the divergence and J is the _ prob- 
ability current in X space. The current J can be 
interpreted as a diffusion current with components 
— 3 Le B,, 0p/dx, and a drift current with components 
(A, — 3 >>, OB,./dx,)p if one wants to keep the form 
— 4B-Grad p for the diffusion current. (If one prefers 
to retain the drift current in the form V,,p, where V,, 
is the average velocity one interprets A,p as the drift 
current and — 3 >>, 0/dx, (B,,p) as diffusion current.) 
Formal application of the operator L — 0/dt to (5) yields 


(x a 2), =) is dX'r(Xo, to | Xe, t; \) 


<OOX", Dp Xs tee). 
With the initial condition 


(13) 


(14) 


this becomes 


5 AEX FA SH (B= Oe CX Xe 


(15) 


4M. C. Wang and G. E. Uhlenbeck, “On the theory of the 
Brownian motion II,” Rev. Mod. Phys., vol. 17, pp. 323-342; April— 
July; 1945. 

A. Kolmogoroff, ‘‘Uber die analytischen methoden in der wahrsch- 
einlichkeitsrechrung,”’ Math. Ann., vol. 104, pp. 415-458; March, 
1931. 


The interpretation of (15) is clearly that the rate 
particle loss by collisions \®(X, ¢) has been added to tk 
continuity equation. Formal application of the operat¢ 
Li + 0/dt) to (6) yields in the same way the differentij 
equation 


fa) 
3; "Xo to | X, t,) 


= {Di —/\b(Xe, te) key ty 


We showed’ that the principal solution of either (15) ¢ 
(16), if it exists, is actually a solution of (5) and (6) an} 
is, therefore, by virtue of the uniqueness theorem, th 
solution of these integral equations. 


integral and differential equations derived by 
Rosenblatt,” and Fortet.’ When X(r) is taken to be thi 
one-dimensional Wiener function «(7) (once integrate) 
‘white noise’) with 2(0) = 0, and @[X(z), 7] = Via 
one obtains from (5) the integral equation (3.8) of Kac| 
and the Laplace transform of (15) reduces to equatio) 
(3.14) of Kac.” When the components x,(r) of X(r) ai| 
Wiener functions with x,(0) = 0, (5) reduces to equatio| 
(1.9) of Rosenblatt.° The differential equation (16) wa] 
derived directly by Fortet.’ { 

If X(7) isa Gaussian Markoff process, and @(X(r), 7) = 
K(é)x;(r) the method of Kac and Siegert* can be applie! 
also, and leads to an integral equation in the time variabl| 
only. In this case the solution of (15) is an exponentis 
function of a second-degree polynomial in the component 
of X, and X, and (15) leads to first-order nonlinea 
differential equations for the coefficients. The equivalene 


| 
of these to the integral equation of Kac and Siegert® re 


quires a somewhat lengthy discussion and will be given il! 
Part II of this paper. It seems interesting to note that th 
present procedure yields some of the results of Kac anc 
Siegert> in closed form. 


EXAMPLE: THE DISTRIBUTION OF THE SAMPLE PROB 
ABILITY DENSITY FOR A STATIONARY MARKOFF 
PROCESS 


It is often necessary to infer the probability density) 
for a random process from a sample. If the process i 
stationary a convenient estimate w*(z) of the probability 
density w(z) is the fraction of the sample length durin; 
which the value of the random process X(r) lies in a smal 
interval or volume element A centered on z, divided by A 

The calculation will be carried through for the Markof 


> M. Kac, “On some connections between probability theory an 
differential and integral equations,” Proc. Second Berkeley Sym 
positum on Mathematical Statistics and Probability, University o 
California Press, Berkeley, Calif., pp. 189-215; 1951. 

® M. Rosenblatt, ““On a class of Markov processes,’ Trans. Amer 
Math. Soc., vol. 71, pp. 120-135; July, 1951. 

7A. Blanc-Lapierre and R. Fortet, “Théorie des Fonction 
Aléatoires,” Masson et Cie, Paris, 321 pp.; 1953. 

8M. Kac and A. J. F. Siegert, ‘On the theory of noise in radi 
receivers with square law detectors,” J. Appl. Phys., vol. 18, pry 
383-397; April, 1947. For an improvement of this method, se 
Siegert, ‘Passage of stationary processes through linear and non 
linear devices,” IRE Trans., vol. PGIT-3, pp. 4-25; March, 1954. 


icess with one component 2(7); the generalization to 
_n-dimensional process is trivial, if one is interested in 
| joint distribution of the components. [It must be 
hasized, however, that the generalization to the 
hple probability of one component of a multidimensional 
koff process is not trivial, since the integral equation 
does not simplify appreciably in that case.] It would 
easy, on the other hand, to generalize our calculation 
jobtain the joint distribution of w*(z,), w*(z.) 

Hz,). This distribution may be useful in obtaining an 
roximation to the distribution of 


[ CON / Be Oue. 


»(z) and w*(z) are slowly varying functions so that the 
» integral can be approximated by >-t_, &(z,;)w*(z;)A,. 
ince we have restricted our problem to stationary 
irkoff functions we will use the notation 


ll 


r(2o | x, b; d) r(Xo, lo | x, bo se l; d) (17) 


| Mena, t) = p(x, to x, to 4 2) (18) 


he estimate w*(z) defined above can be written in 
form 


wi (Z) = w/t 


ere u is defined by (1), with 4, = 0 and @ a function 
v only, which is defined by 


Ae i. eee AZ, (19) 
iO otherwise. 
E q. (6) becomes 
A t é zt+A/2 
pl 2 50) = ple l2,0- 5 f dt pe 
pico ot nae |e, tt = Nae. (20) 


in the limit A — 0 this equation simplifies to the 
beral equation 


Ja, 1) = ple |e.) —» fav 


P(Xo | z, t')r(z | yb ie d) (21) 


lich can be solved by taking Laplace transforms. 
if r, and p, denote the Laplace transforms of r and p, 


)) 


rr(to |v, S;r) = / € “T(t | x, tyr) dé, (22) 
0 
F obtains 
Vahl Dass NL Delve |, 5) 
— px (Ho | 2, 8)rz(z | x, 8; A). (23) 


isolve (23) for r(x | x, s; \) we first put v% = z and 
jain 


tf Darling and Siegert: Problems in the Theory of Noise—Part I 


35 
riz | @,8;d) = pr@ | x, 8) 
— Api | 2, s)ru@ |x, 83) (24) 
and 
riz |x, 8;d) = pile | x, 8)/[1 + Api(e | z,s)]. (25) 
Substituting this result into (23) yields 
rr(to | %,8;%) = pr(%o | x, 8) 
— Npi(xo | 2, 8)px@ | x, 8)/[1 + Api |z,8)]. (26) 


We denote by p(a, | x, u, ¢) the joint probability density 
for « and u at ¢ with fixed initial value of x(7) (~(0) = 2), 
so that 


1(Xo | x, t; d) m, ‘| ep | v,U, t) du 
0 


and we denote by pz(% | x, u, s) the Laplace transform of 
p with respect to ¢. We compute first pz(a | x, u, s) by 
Laplace inversion of (26). To do this we have to split off 
the term which leads to a delta function in pz(ap | x, wu, s) 
(unless its coefficient vanishes). This term corresponds to 
a delta function in p(x, | x, u, t) and the coefficient of the 
delta function is the probability that w*(z) = 0, which 
will occur with nonvanishing probability, for instance, 
if x and x are both smaller or larger than 2z. 

We, therefore, split off those terms in (26) which do 
not vanish in the limit \ © and write 


1 (Xo | X, 8, Ar) = pr(Xo | 8) 
— pilXo | 2, 8)p@ | x, 8)/pr@ | 2, 8) 


en Ez | z) s)pi(Z | x, 8) ee Pr(Xo bes s)pr(z | x, | 
1 + Api | 2, 8) pr | 2, 8) 


a pr(xo | x, 8)pr@ | 2, 8) — pr(ao | 2, s)pr2 | x, 8) 
pr(z | 2, 8) 


Pri(Xo | es s)pr@ | x, 8) xy 
pi | é, s)Q ae Api(2 | é, s)) 


(27) 


Taking the Laplace inverse with respect to \ we obtain® 
pr(Xo | x, U, 8) 


_ Prl%o | £, 8)pr(@ | 2, 8) — pro | 2, 8)pr@ | x, 8) 


64 
pr | 2, 8) ) 
ae Pi(%o | a s)pi(2 | x, 8) er ee Nes 2) (28) 
pie | 2, 8) 
for u = O and zero otherwise, where 6.(u) is defined by 
6.(u) = 0 foru 4 Oand 


| 00) die = 1 Sfomanty, ses) 0% (29) 
0 


9 Hq. (28) can be checked by comparing the moments of u with 
those obtained directly. See Appendix I. 


36 IRE TRANSACTIONS ON 
The conditional probability of finding w*(z)é¢ in the 
interval (w, w + du), if 7(0) = wv and x(t) = 2 is known, 
is thus given by p(a | 2, u, )du/p(x | x, f) where p(ao | 
x, u, t) is to be obtained by Laplace inversion from 
Px (Xp | x, u, s), given by (28). The (unconditional) prob- 
ability of finding w*(z)t in the interval (wu, w + du) is 
given by 


NOD .= | TACs Cah ee Tice ls 


and is to be obtained by Laplace inversion of 


fos) 


piu, s) = | W(Go) pr(Xo | £, U, 8) dxo du 


— 


ieee) 
=o -aneina) 8 
___ wl) 5 u/pL(zlz,8) 
fs spre | z, 8) (30) 


This inversion will in general be too complicated to 
perform exactly, but an asymptotic evaluation for large 
¢ can be carried through. We expect that the quantity 
(u — wt)/VWt becomes normally distributed in this 
limit. We thus consider the distribution 


= prob {(u — wl) /Vt <0} 


+740 
Al pr(u, se’ is} 


Oforv < — wvit 


Fv, t) 


forv > — wv t 
(31) 


where the path of integration has to be taken to the right 
of singularities of p,. Using (80) and interchanging the 
order of integration yields 


F@y t= 5 ete ‘t — om eee: ds (82) 
ne iat ds 
2 eS Diy, 

-exp [(—vs Vt + s°t(p, — w/s))/(w + s(pi — w/s)| 

forv > — w Vt, where w and p, stand for w(z) and 


pr(z | 2, s), respectively. With ¢ = — is Vtand r(z, s) = 
p, — w/s we write this result in the form 


See SU eG 1 
UO) iabs a. Deiat MES + igre, i¢/V)/Vt ae) 
-exp{[—ity — ¢?r(z, i¢/VO)]/[w + ttre, it/ V)/Vt)} 


where the path of integration has to be taken below any 
singularities of the integrand. If 


lim 7(z, s) = 7(2) (34) 


s>0 


INFORMATION THEORY Mare 


eS 


exists at least for Re s > 0 and is finite and larger thar 
zero and if the limit ¢ > © and the integration can be 
interchanged, then F(v, 2) approaches the normal dis: 
tribution for v > — w tn ten | 
4 


Fo, =) = Uer@-war? ferro dn. 


; : | 

For the existence of the limit 7(z) it is sufficient that 

the stationary distribution is approached sumo 
fast so that 

1 

‘A | pz | 2, t) — wz) | dt exists. (36 

0 7 

The significance of the condition r(z) > 0 can be seer 

in various ways. We note first that the unconditiona 


first and second moments of wu are, according to (51) i 
Appendix IT, 


u = wt (37 
igre = aw f dts ie plz ees thy = i) dt, 
= 2 dt, - 2 tat 

w f »f plz | z, t’) dt 
= ane f Deana ade. -[ Lp, by) tu) 8 Ge 


One has thus 


u? — Way = 2w [ {p(z | 2, t’) — wh(t — v’) dt’ (39 
or 
ee pee : av 
LAY — Tr = 20 f VpCahe, 0.) — wif - £) av. (40 


In the limit t + ©, the second factor in the integrand i 
merely a convergence creating factor, so that if thy 
integral converges with the first factor alone, 7.e., 4 
fortiori if the condition (36) is fulfilled 


A 


2w() [tle 2, t) — we} ae 


lim tw? — @),. 


t—0 


2w(z) r(z). (41 
This shows that for w(z) # 0, r(z) must be at least non 
negative. 

From (26) or (30) one sees that 


[ e*' prob {u # 0} dt = w/s’pz. 2 
We can consider prob {u # 0} also as the probabilit 
that the time? at which the first contribution to u occurs’ 
is smaller than ¢. The first moment # is, therefore, 


10 For a continuous process, this is the first-passage time. 


I? Darling and Siegert: Problems in the Theory of Noise—Part I 


pola le 


e uf ai es) 


S Pi 


s=0 
ha = w'[p, — w/s].-0 


= We 72). (43) 


is shows that 7(z) is finite and positive as required, if 
ii 0 and if the average time at which the first con- 
ution to wu occurs is finite and different from zero. 


| APPENDIX | 


o derive (5) more formally we use the trivial identity 


: a A [ @(X(7), 7) ar} = Ia iE dt'®(X(t’), t’) 


“exp a mn if &(X(7), 7) ar}. (44) 
| to 

averaging both sides with initial and end point of 
t) fixed, and multiplying by p(X, to | X, #) one obtains 
im the definition (4) 


Koma | EX c8 2d) 


t 


je PX, Peer 2 | dt’ 


| ex, t’) exp {-r [ &(X(7), 7) ar 


t 


Lats => Be OCs Me = x| } (45) 
the second term, we now introduce a third condition 
i ) = X’ and compensate for this by multiplicatiom with 
NX 0, fo; X, ¢ | X’, t’)dX’ and integration over X’, where 
NXo, fo; X, t | X’, t’)dX’ is defined as the probability 
Te X(t’) in dX’ at X’ for a path with fixed initial and 
id point X(t.) = X, and X(¢) = X, respectively. The 
rond term thus becomes 


at! / WOO rt XE LX,.1) 
| exe), t’) exp ‘ d i (X(z7), 7) ir} | 


X(t) = Xo, X(t’) = X’, XM) = x| (46) 


av 
ie can now take ®(X(?’), ¢’) out of the average symbol 
| 6(X’, ¢’), since X(t’) = X’ is held fixed. We also can 
it the condition X(t) = X in the average symbol 
hee, by virtue of the Markoff property, X(/) is statisti- 
llly independent of values of X(¢’’) for ¢” < ’, if X(’) is 
ld fixed. Also from the Markoff property and the defi- 
tion of conditional probabilities follows for tf) < t’ < 4, 


1X0, to; Xt | xe ey) = p(X, to | Dae t’) 
| SPX ab me X nb PX, to | X, 0): 


(47) 


37 


We thus have 


POM PAX 7) Sen Kee ee af av | ax’ 


exp - r iP &(X(z), 2) ar} 


-X(t) = Xo, X(t’) = x’| P(Xo, to | Xt) 
P(X’, t)p(X", |X, 1) (48) 
and using the definition of r(Xo, t& | X’, ’, X) on the 


right-hand side, we obtain (5). By a similar argument 
starting with the identity 


exp ie d ip @(X (7), 7) ar} = 1 —r ip al SGA ae) 
“exp — r is (X(7), 7) ar} (49) 
one obtains (6). 
APPENDIX II 


Eq. (28) can be checked directly by the method of 
moments. Since for integer n > | 


t n 1 
uu ie dt; Sarai 
i 0 / il} A z—A/2 
t tn te 
nt | at, | dl, | dt, Av 
0 0 0 


TL fale) —2) a, 60) 


zt+A/2 


d(a(t;) — 2;) ie} 


one has 
cu" | X(to) == Xo x(t) = mo be 
z+A/2 
t tn to 
= nt | at, | ie af doa? I esl 
10) 0 0 pales 


n-1 n 
-T] plZ; hice bias — b pn | rok ety) iy dz; 
7=1 al 


I 


nt | dive 2,4 = aul pe 
10) (0) 


to 
“plz ike = tn=1) Sted? / pe 2s ipa ty) 
0 


“DP(2Xo 2; ty). (51) 
Taking Laplace transforms one has 
/ e-** di (u® | 2(t) = ao, 2() = 2)av 
0 
= n! pi(%o | z, s)pz(e | 2, s)pz(e | 2, 8)”. (52) 


This is in agreement with Laplace transforms of the mo- 
ments obtained by integration with respect to u from (28). 


38 IRE TRANSACTIONS ON INFORMATION THEORY 


A Systematic Approach to a Class of Problems in the 
Theory of Noise and Other Random Phenomena 


—Part I, 


Examples’ 


ARNOLD J. F. SIEGERT?T | 


Summary—The method of Part I is applied to the problem of find- 
ing the probability distribution of u = let K(r)x?(r) dr, where K(r) 
is a given function and x(r) is the Uhlenbeck process. The earlier 
methods of Kac and the author yielded the characteristic function 
of this distribution as the reciprocal square root of the Fredholm 
determinant D of an integral equation. The present method yields 
a second-order linear differential equation with initial condition 
only for D as function of t. For the special cases K(r) = 1 and K(r) 
=e “7 the characteristic function is obtained in closed form. 

In Section III, we have verified directly from the integral equation 
the differential equation for D and some relations between D and 
the initial and end point values of the Volterra reciprocal kernel 
which appear in the joint characteristic function for u, x(0) and x(t). 


Srcrion | 


N previous papers,’ Darling and the author de- 
rived two integral equations for a function closely 
related to the characteristic function of the prob- 

ability distribution of the functional J;, @(X(7), 7) dr, 
where the components x;(7) of X(7) form a Markoff pro- 
cess and $(X, r) is a given function of X and 7. For an 
important class of Markoff processes, the solution of 
these integral equations was shown to be the principal 
solution of a partial differential equation similar to the 
Fokker-Planck equation. We also derived an integral 
equation relating two solutions of the problem with two 
functions, ¢.(X, 7) and ¢,(X, 7), which can be used for 
a perturbation calculation to obtain solutions of the 
problem when the solution for ¢;(X, 7) is known, and 
the solution for a function ¢.(X, 7) ‘in the neighborhood” 
of ¢,(X, 7) 1s desired. 

Since the integral equations as well as the differential 
equations derived in this previous paper are rather 
formidable, it seemed of interest to consider first some 
eases for which the problem could be solved or at least 
be reduced to an integral equation in a single variable 
by an older method.”” In the case ¢(X, 7) = K(r)2*(r), 


* Manuscript received by the PGIT, August 13, 1956. Most of 
this work was done while the author was a consultant for The 
RAND Corp., Santa Monica, Calif. 

+ Dept. of Physics, Northwestern University, Evanston, Ill. 

1—, A. Darling and A. J. F. Siegert, “On the Distribution of 
Certain Functionals of Markoff Processes,’ The RAND Corp., 
Paper P-429; October 31, 1953, and Part I of this paper (P-738), 
“4 Systematic Approach to a Class of Problems in the Theory of 
Noise and Other Random Phenomena;”’ September, 1955. Darling 
and Siegert, “‘On the distribution of certain functionals of Markoff 
chains and processes,” Proc. Natl. Acad. Sct., vol. 42, pp. 525-529; 
August, 1956. 

2M. Kac and A. J. F. Siegert, ““Note on the theory of noise in 
receivers with square law detector,” Phys. Rev., vol. 70, p. 449; 
September, 1946. ne : 

Kac and Siegert, ‘“‘On the theory of noise in radio receivers with 
square law detector,” J. Appl. Phys., vol. 18, pp. 383-397; April, 
1947. 

3A. J. F. Siegert, “Passage of stationary processes through 
linear and non-linear devices,’ IRE Trans., vol. PGIT-3, pp. 
4-25; March, 1954. (The RAND Corp., (P-419); October 29, 1953.) 


4 


| 
where 2(r) is a Gaussian random function with arbitrary 
autocorrelation function p(r), the older method applies 


and the problem can be reduced to the problem of solving 


either the homogeneous integral equation” 


; { 


i oli — DEQ pris = Aes (1 


or the inhomogeneous integral equation® 


gr(71, esi a [ GF ap) 


‘K(dgi(r; 2) dr = pln, — 72). 


If, specially, the function x(r) is a component PAC | 
of a Markofhian Gaussian process, the method referrec 
to’ applies and leads to the partial differential equatior 
[A5): of Parte]: 

2 or | 

(L — r\K@apr = ry (3, 

if the transition probability density p(X, | X, #) is the 
principal solution of the differential equation 


pop 

sty (4 
where L operates on the components of X. One sees easily 
that in the case of Gaussian p(X, | X, #) the additiona, 
term in (3) does not seriously complicate the differentia 
equation and that r is still of the Gaussian form. For the 
coefficients of this Gaussian, one obtains a system Oj 
differential equations of first order and second degree 
with ¢ as the independent variable with initial conditions 
only. The equivalence of this system in which ¢ appears 
as the independent variable and the integral equation (2° 
in which ¢ appears only parametrically is not trivial 
even though (2) can be reduced to a differential equatior 
with appropriate boundary conditions when p(r) is the 
auto-correlation function of a Markoffian Gaussiar 
process. We have, therefore, in Section II worked out ir 
detail the case of the one-dimensional, Markoffiar 
Gaussian random process (Uhlenbeck process).* As 
special examples, we give in closed form the characteristi¢ 
functions of fj) 2°(r)dr and fj, e *’x(r)dr. The latter 
represents the output of a receiver consisting of single. 
tuned IF and audio stage with quadratic detector, with 


* Analogous problems for the Wiener process have been treat 
independently by R. Deutsch, ‘Piecewise quadratic detector,’ 
1956 IRE Convention ReEcorp, Part 4, pp. 15-20, | 


ite noise input turned on a time ¢ before observation. 
Section IIT, we have shown for this case how the system 
| differential equations originally obtained by the 
thod' follows directly from (2). The purpose of this 
livation was primarily to show that some of the equa- 
1s derived by the method’ remain valid when the 
1o-correlation function of the Uhlenbeck process 
ri, — 7) = exp (—8|7, — 7 |)] is replaced by a general 
‘ction h(7,, 7). We found, e.g., a simpler expression for the 
racteristic function f = (exp [—Af> K(r)2?(r)dr])ay 
ere 2(r) is a Gaussian random function with arbitrary 
‘relation function p(7, — 7). We had shown® that the 
ction f can be expressed in terms of the trace of the 
jution g,(71, 72) of the integral equation (2) (which 
bends parametrically on ¢) by 


IN t 
f = exp i [ dx | Kees an 


jis equation is suggested by the well-known expression 
| the Fredholm determinant in terms of the Volterra 
iprocal function. We now obtained the simpler ex- 
»ssion 


| 


P= S40) [= rN f KW) gt, t’) ar’| 


ere g)(71, 72) 18 the solution of (2) with the upper limit 
laced by ¢’ so that g(t’, ¢’) depends on ?¢’ implicitly 
80. [See (58).] 

The results’ and the present paper raise an interesting 
‘estion for further investigation. In Kac and Siegert, 
b fact that the problem of the probability distribution 
|a quadratic integral form of a Gaussian process «x(f) 
Id be reduced to the solution of an integral equation 
volving as variable only the time was clearly a conse- 
lence of the fact that the joint probability distribution 
t x(t), x(t) --- x(¢,) is the exponential function of a 
Hadratic form. The results’ show that this simplification 
m also be understood as a consequence of the fact that 
differential equation (4) is not essentially complicated 
' the addition of a quadratic term. One may thus expect 
find for Markoff processes other than the Gaussian 
cesses certain functionals for which the problem of 
\ ding the characteristic function reduces to differential 


luations with only the time as independent variable. 


Section II 


‘In this section we will apply the method’ to the problem 
evaluating the function 


ro | a, t, ») 


(exp [- r [  K(n)2%(2) ar | 


ec Otay Ab) r) 


av 


(5) 


here x(r) is the stationary one-dimensional Markoffian 
jaussian process (Uhlenbeck process), which is described 
y the transition probability density 


-p(%o | x, t) 


 * 


17 Siegert: Problems in the Theory of Noise—Part II, Examples 39 


4s —Bty\2 
pte |) = rth — Yeap = BE 


with constant 8, K(r) is a given function, \ is to be chosen 
positive if K(r) > O and negative imaginary if K(7r) can 
assume negative values. The symbol (|),, denotes the 
average of the functional to the left of the vertical line 
under the conditions written to the right of the vertical 
line. The extension of the result to, e.g., the characteristic 
function for {i K(r) >>, «?(r) dr where 2, (r) are independent 
Uhlenbeck processes is trivial. 

This problem can be treated by the method” and the 
result can be brought into a slightly more convenient 
form by an extension of this method.’ 

It is convenient to work with the function 


Pp OG ht N) eae I exp iE s + ints + itn) 


‘T(ao | @, t, XN) dxdt (7) 


= (exp {ilne(0) + i¢a(t)] — rf K(1)2x?(7) ar}) 
from which r(x | 2, t, \) is easily obtained by Fourier 
inversion and multiplication by 27 e*°”*. Note that 
the characteristic function for the distribution of 
{§ K(r)a*(r)dr without initial and end conditions is 
simply obtained by choosing 7 = ¢ = 0. 

The result obtained by using the methods’” is 


Aan. one can = [ ae a KGa tener 


1 
= ) (n° gx, 0) = 2nf gO, t) His’ (t, »} (8) 
where g,(71, 72) 18 the solution of the integral equation 


nln) +2 [ Kel — 2) 


(9) 


“gx(t, 72) dt = p(r, — 7). 


(See Appendix IT.) 

This result is valid for any stationary Gaussian process 
with auto-correlation function p(7). The variable 7, 
appears only as a parameter in the integral equation. If 
x(r) represents the output of a network with lumped 
circuit and with white noise input, the integral equation 
reduces to a differential equation. For the Uhlenbeck 
process, one has specially p(r) = e°'"', and one would 
reduce the integral equation to a second-order linear 
differential equation with appropriate boundary ¢on- 
ditions with 7, as the independent variable. 

It is interesting to see how the new method leads to a 
Riccati equation for g,(¢, ¢), with ¢ as the independent 
variable. This Riccati equation is of course also equivalent 
to a second-order linear differential equation, which, 
however, is not the same as the differential equation 
obtained for g,(71, 72) as function of 7. 


40 IRE TRANSACTIONS ON 
The new method leads to the differential equation 
ar far Jey _ 
’ Oe — Kar 1 
ot a ae aR Ox a (10) 
Since p= satisfies 
op fa? e + eh 
at mG 2 OG Jr ay 
From this one obtains 
or *) arr 
— = — ae 12 
Doe — + 5 ee ae ee 
Making the Ansatz 
* = f exp [—3(oon + 205m + o6°)] (13a) 


one obtains (with dots indicating differentiation with 
respect to /) 


F— Maun? + 2atn + anf) = —B1(—on — anf) + 2 
+ dK [lon + 015)’ — 01] (18b) 
and by comparing coefficients 
dinf _ _yx 
dt —— \Ko, (14a) 
we, = 26(1 —-0)) —2nKo (14b) 
Ee hie | 
- 2Ko (146) 
ame — 8 ~ Ka. (144) 


Since r(z, | z, 0, A) = d(% — 2), we have 


A(n | Gs 0, d) = (20) *” I exp [e8 ae 1nXo + ign) 


d(x — Xo) dx dro (15) 


l| 


(2m)? i exp ie =P OG ar a0) dx 


exp (—3(n + §)). 


The initial conditions for the differential equations 
(14a) to (14d) are, therefore, 


fO) = (0) = o,(0) = 1. 


We note that oo, ¢, and f are obtained by quadratures, 
if o, has been found, which means that g,(0, 0), g,(0, #) 
and f> dk fi K(r)g,(7, r)dr are obtained by quadratures 
from g,(t, t). It should be remembered that these relations 
need hold only for the special choice p(r) = e °'"! since a 
stationary Gaussian process is Markoffian only if it has 
this special auto-correlation function, and (10) is based on 
the Markoff property. We will show in the following 
Section III, however, that (14a) and (14c) are independent 
of this special form of p. 


I 


(16) 


INFORMATION THEORY M P| 


It will generally be more convenient to convert the 
Riccati equation (10b) into a second-order linear differen- 
tial equation. With the substitutions 


Gh > DN oly 
and 
FS ae 


where prime denotes differentiation with respect to x we 
get 


do, re B 
QI dE  an 


S) apeiron 
u AK U Uu 


or q 


Nomad) & a, 2 ly 


— a) = a; 


(d 


One initial condition is obtained from o,(0) = 1 
u'(Xo) = u(2o) 
which simplifies to 


ul" (to) = 


where 2 is the value assumed by x for ¢ = 0. This de- 
termines u(x) except for a constant factor which is) 
irrelevant since all results depend only on o, ul [UW 
Of special interest is the characteristic function for the 
unconditional distribution, f. We obtain from (14a) 


dln f 1ldinu 


i KG) 290 (22b) 


— pes — = | 
NK di ads OO eae Oe Cai 
or 
2 uo) 
~ Nu(a) (2s 


Since u(x.) is essentially the Wronskian, it can always be 
written in a convenient form (see Appendix 1). 

The present method thus presents the characteristic 
function in terms of the solution of a differential equation 
with initial condition rather than through an eigenvalue 
problem or an inhomogeneous integral equation. 

For o and oy we obtain from (14c), (14d), and (16) 


ding = — 6 dt — o, dx (25) 
Ing = — @t(x) — Inu + const (25a 
gq =e” u(xo)/u(z) (26) 
and 
Tem wo = HOMO) 2) 
or 
2 Menem i “gr ?80) 2 dy. (28) 


Kac and Siegert,” solutions in terms of infinite pro- 
tts were given for the two cases K(r) = 1, and 
-) = e ** with ¢ = o. We will give here the two 
itions in closed form with the second case for general t. 
Nase 12K) = 1. 


| 


| Bu’ + uw —u=0 (29) 
[. : 
at == 2K: (30) 
c general solution is 
| tin Oo (Ge Abe”) (31) 
I, 
| k= V(6/2d)" + B/d (32) 
le initial condition (22a) requires 
oS = "0 + 8/20). (33) 
2 thus have from (24) 
ap he 
(a + b) cosh cx + (a — 6) sinh xx 
B= e°*/** (cosh xx + «(1 + B/2A) sinh xx)7’””. (34) 
lith 
n= V1 + 4d/6 (35) 
ls becomes” 
f =e” (cosh Bnt + 1? (1 + 2X/8) sinh Bnt)'”. (36) 


he roots \” of f are determined by the roots 7, of 


1+ 7° 


th8nt = (37) 


= af = 
1 + 2n/6 


xm =F Gd. (38) 


, Kac and Siegert,’ the meaning of a and @ is inter- 
manged and X, is in our present notation given by 


i! 
IN ae OW: (39) 
» that we have 
2 
Ne ee 40 
i) e) 
Vith » = zy and yn, = — Ym we then have agreement 


ith equations (7.35) and (7.36) of Kac and Siegert.” 
ii@ase 2: Kit) =e," 


5 In Kac and Siegert (footnote 2), f? was given in product form, 
ce we treated there the envelope detector. 


~~ 


ih? Siegert: Problems in the Theory of Noise—Part II, Examples 


4] 
In this case we get from (17) 
Tp = VINCE OS (41) 


since it turns out to be convenient to choose the integra- 
tion constant equal to zero. We thus have 


% = —2da’ (42) 
and 
AKC) = = 1/2 san (43) 
Eq. (21) becomes 
ul — = (u’ — u) = 0 (44) 
and has the solution® 
u = 02, / 72) (45) 
where Z,(y) is a Bessel function of order 
Oe Se gs) (46a) 
and 
y = 86/a. (46b) 
With the aid of the identity” 
© (a2, 78) = V7 8D V9) 4D) 
the initial condition is seen to be satisfied by 
u(a) = 2°? (Ny-o( Vya0) Jo V2) 
= Spal WV yt0)N AV 72)] (48) 


where J, is the ordinary Bessel function and JN, is the 


Neumann function 
N, = (J, cos rp — J_,)/ sin rp. (49) 


The function u(x.) must simplify since it is essentially a 
Wronskian and we have using (47) 


Led 
Vy dio 


(p—1)/2 p/2 
[xo N,-1%0 J, cc 


to Uae) 


CR? fe aa Nl 


= ea a eo a 2 \ 
Vy dato \°° TV xo 


where the second equation follows from an identity.® 
We thus get 


ua) =; 4y 3 '(p = Deen = r ao 


(50) 


(51) 
and from (24) for the unconditional characteristic function 
FO) = {rag 7a? (Nol V¥00) To V2) 


— Fp Vye)N AV yay}? (52) 


6 ®. Jahnke and F. Emde, “Tables of Functions,’’ Dover Publi- 
cations, New York, N.Y., 4th ed., p. 146, sec. 7, second equation; 
1945. 

7 Ibid., p. 145, sec. 5, fourth equation. 

8 [bid., p. 144, sec. 4, third equation. 


42 IRE TRANSACTIONS ON INFORMATION THEORY 
where x, %, p and y are given by (41), (42), (46a), and equations, whose solutions can be shown to be unique, 


(46b). 

For a — 0 this must reduce to (36). For £— ©, x > 0 
and the first term in the bracket becomes negligible. We 
then have 


w?N, (Vy) & —a"(-V)? TO) 
Pe) ye Gay) Se ea) LO 


The product form’ is then checked by using the Weier- 
strass product? 


(ya) I, V9) = FT (1 = He) / re - 


(55) 


(53) 


(54) 


which yields 


co) = 10-8) = 104+ 


n n 


—1/2 
46%) (56) 


a Yn 


where the numbers y, are the roots of J,~2(y). 

In Kae and Siegert,” the probability density P(V) was 
obtained for the random variable B [% e (ai (r) + 
v(r))dr, where 2x,(r) and 2,(r) are two independent 
Uhlenbeck processes with 2} = a; = 1/2 and auto- 
correlation function e °'"'. We get for this the result 
(after interchanging a and 6 to conform with the 
notation’): 


PIV) = siz fb 7%() ad/28 
(57) 
ee UN lie scite ( LZ Sat)" 
iy oes f dy I] , By, 


where yn is the nth root of J (2a)\8)-1(y) = O in agreement 
with equations (4.45) and (7.19) of Kac and Siegert.” 


Srecrion III 


It seemed to interest to verify the differential equations 
(14a) to (14d) directly from the integral equation (9) 
and to show that (14a) and (14c) remain valid when 
p is replaced by a general symmetric kernel h(7,, 72). 
We will here only outline the derivation; the details 
are given in a RAND Corp. paper.” 

From the integral equation 


g(71, T2) Sak ik hae TK (2) g(r, T2) dr = h(r:, T2) (58) 


one obtains an integral equation for dg(7,, 72)/dt and an 
integral equation for g(7,, ¢). Comparing these two integral 


9G. N. Watson, “‘A Treatise on the Theory of Bessel Functions,”’ 
Cambridge University Press, Cambridge, Eng., 2nd ed., p. 498; 1952. 

10 A. J. F. Siegert, “A Systematic Approach to a Class of Problems 
in the Theory of Noise and Other Random Phenomena. II. Ex- 
amples,” The RAND Corp., Paper P-730, sec, 3; September 1955. In 
this paper, the function h in (3.8), p. 16, should be replaced by f. 
The function g in (3.21), p. 19, should be replaced by g* = g — p. 
The right-hand sides of (3.23), p. 19, and (3.24), p. 20, should have 
positive signs, and the right-hand side of (3.25), p. 20, should be 
Bl2 — g(t, 0]. 


March’ 


one obtains i 
ag(r, 72)/At = — 2K (Hg(r, gt, 72). (59) 


Since g(r1, T2) is symmetric because of the symmetry of | 
the kernel, (59) yields | 


| 
dg(0, 0)/dt = — 2rK(é)9°(0, 2) (60) | 


which proves (14). ' 
From (87), of Appendix IT, follows" | 

» t | 

nf = -| ax | drK(7r)9,(7, 7) (61) | 

0 0 i} 


where g,(7, 7) is the solution of (58) with \ replaced by | 
k. Differentiation with respect to ¢ and use of (59) yields | 


nN | 

din f/dt = K(f eg(E, 0 — galt, Dl de — (62), 

: 

where g,, is defined by 
g(r 7) = | gles, DEC galt, 7) dr. (63) 

y 


From (58) one obtains integral equations for g,.”’(11, 72) | 
and 09,(7;, T2)/0x. Comparing these one finds 


Ors, T2) = —309(71, T2)/OK. 
Substituting this result in (62) yields 
dln f/dt = — »K(bgy(t, 0) (65) | 


and proves (14a). 

The specific form of the kernel, h(7,, 72) = e°'"~! is | 
needed for the verification of (14b) and (14d). The 
functions g(7, 72) and e *'”~**'! have a discontinuous 
derivative at 7, = 72, but we can obtain dg(¢, ¢)/dt from 


dg(t, t)/dt = dg*(t, t)/dt = [dg*(11, 72)/dt],,-1.=1 


fe [9g*(71, i Oa a [dg*(t, 72) OTe. 21 (66) 
where g* is defined by | 
g(r, ™) = g(r, Tate eerie (67) 


The first term in (66) is evaluated by means of (59). The 
second term contains [dg(7,, ¢)/07,],,-, which can be 
evaluated by differentiation of (58); one obtains 


[dg(r1, t)/OT),,=¢ = [2 mF g(t, t)). 


The third term of (66) is equal to the second term by 

symmetry. Subtracting the corresponding differential. 

quotients of e *'~"*! one obtains (14b) with o, = g(t, t). 
The proof of (14d) starts from 


dg(0, t)/dt = [dg(0, 7)/dt],-. + [dg(0, r)/dr],-, (69) 


and follows very closely the preceding derivation. 


(68) 


1 Hq. (61) is essentially the expression for the Fredholm de- 
terminant of a kernel expressed in terms of its Volterra reciprocal 
function. See E. T. Whittaker and G. N. Watson, “A Course of 
Modern Analysis,” Cambridge University Press, Cambridge, Eng. 
sec. 11.21, example 2, and sec. 11.22; 1940. : 


~ 
~~, 


APPENDIX [ 


| Eq. (21) can be written in the form 


wt + 28 22 ay — a) = 0. (70) 
lhe Wronskian w(x) defined by 
w(x) = Us, — UU, 
un thus be conviently computed as 
w(x) = w(a) exp {26[t(a) — t(x)]}. (71) 


ux(x), u(x) are two linearly-independent solutions of 
0) then the initial condition for w(x) is satisfied by 
t) = [us(2o) [wi(®o) — Ur(o) Ju2(x) (72) 


less both coefficients vanish. Using (70) once more 
1e obtains 


— U(X) u(x) — 


jute) = —(28 2) * fusrcedanla) ~ vl" (asus) (78 
d 
= —w(a) 
|= [uas)/ula)}”* = fola) 
-K (0) /28 (us! (es)ua(e) — i"es)ual\2. (7B) 


} 
"1 


) APPENDIX IT 
To obtain (8) for the function #(y | ¢, ¢, \) defined by 


id 


’), we write the Gaussian random function x(f) in the 
prm 


a= Dec. dr, od) (76) 
here the numbers X, and the functions ¢,(f) are defined 
s the eigenvalues and eigenfunctions of the integral 
juation 


[ = OKO Ye(") de! = 9) 7) 


jiith normalization 


[ Ket ar = 1, (78) 


nd where the random variables c, are independent and 
Haussian with (c,),, = 0 and (c’),, = 1. 


Siegert: Problems in the Theory of Noise—Part II, Examples 


43 
We then have 


(exp (itnx(0 CoA) aA ip K(7)x°(7) ar) 


(79) 


= | exw @ Dow, - 0) Tew (- ae 


with 
v = Vd, (ne,(0) + £e,(0), (80) 
since 
it K(a)2"(7) dr = » VA, Cl, 
[ K@ee) dr = Dd. BD 
Evaluation of the integral (79) yields 
An | s,i) = [] +2)” 
A[ne.(0) + f¢,(b)] + $21) 
exp (=5 5 (te ee 
We define the function g)(71, 72) by 
rN, v 1 v 2 
n(n, 7) = > medtert) (83) 
and obtain from (77) that 
wrists) +2 flrs — 1K gsr", 12) 
< Dy d,Gy(71) (72) 3 p(71 Ay T2) (84) 


In terms of g,(71, 72) We can now express the exponent in 
(82): 


r,[ne.(0) + te,(d 7 
JE, 1 + 2yd, 


=A n 90, 0) ots 2nf9r(0, t) AF eonlt, t). (85) 
The product is obtained by writing 


eh ya [29 v 
ein I] CW) nS Litm > mh, 


=— iD K(a)g)(7, 7) dr. (86) 


Since the product is equal to unity when \ = 0, we get 


ED Gi), =? = exp tet dk i K(r)g.(7, 7) dr). 
| (87) 


A) 


44 IRE TRANSACTIONS ON INFORMATION THEORY 


March 


On the Capacity of A Noisy Continuous Channel’ 


SABURO MUROGAT 


Summary—tThe capacity of a noisy continuous channel is dis- 
cussed in both cases where the signal transmitted over the channel 
is expressible by a process with mutually independent random 
variables and where it is expressible by a Markov process. Unlike 
discrete channels, continuous channels impose certain restrictions 
on transmitter power in general. In the case of a continuous channel 
under disturbance of additive noise, a theorem on the capacity 
in terms of channel parameters is obtained and applied. Then in 
the general case of a Markov process a general procedure to calcu- 
late the capacity is shown. 


INTRODUCTION 


HE channel capacity may be regarded as the 

le most important concept introduced by Shannon 

in the information theory.’ The author has already 

discussed the capacity of a noisy discrete channel.’’* We 
will now discuss that of a noisy continuous channel. 

In general, the type of restriction on continuous channels 
is fairly different from that on discrete channels, so we 
must treat it separately. 

If a function of time f(¢), which is an input signal to the 
channel, is limited to the frequency band 0 to W eps, we 
have the following expansion: 


= sin r(2Wt — n) 
2, Xn m2Wt—n) ’ (1) 


where X, = f{(n/2W). This is a theorem which Someya* 
and Shannon derived independently. With this sampling 
theorem, we need consider only the ordinates f(n/2W) 
at a series of discrete points spaced 1/2W seconds apart. 
Intervals between two successive points in time will be 
called Nyquist intervals. 


TO) 


—oo 


A ContTINuoUs CHANNEL HXPRESSED BY A PROCESS 
with MurusaLty INDEPENDENT VARIABLES 


The Capacity of a Continuous Channel Expressed by a Pro- 
cess with Mutually Independent Variables 


Now we assume that the sampled values of signal 
f(n/2W) are transmitted independently of each other 
over the continuous channel. Let P(x) be the probability 
density that the signal value at the input of the channel 
at a sampled time point is z and P’(y) be the probability 


* Manuscript received by the PGIT, July 11, 1956. 

t Elec. Communication Lab., Nippon Telegraph and Telephone 
Publie Corp., Tokyo, Japan. 

1C, E. Shannon and W. Weaver, “The Mathematical Theory 
of Communication,” Univ. of Illinois Press, Chicago, Ill.; 1949. 

2S. Muroga, ‘“‘On the capacity of a discrete channel I,” J. Phys. 
Soc. Japan, vol. 8, pp. 484-494; July/August, 1953. 

38. Muroga, “‘On the capacity of a discrete channel II,” J. Phys. 
Soc. Japan, vol. 11, pp. 1109-1120; October, 1956. 

4]. Someya derived this theorem in ‘“Theory of Waveform Trans- 
mission,’ (in Japanese), Shukyosha Co., Tokyo, ch. 4; 1944. 


density that the value at the corresponding point of 
time at the output is y. Let p,(y) be the conditional 
probability density that the signal value y is received 
at the output of the continuous channel under disturbance 
of a stationary noise when the signal value x is trans- 
mitted. This specifies the statistical property of this 
channel and will be called the channel transition density 
function. 

Now the transmission rate R for this channel is defined 
as 


freee / P’(y) log P’(y) dy 


+ |] P@p.@) loz p.y) dv dy @) 


and its capacity as the maximum of R& for P(x) with p,(y) 
fixed. But we have the following relation between P(x) 
and P’ (ys 


i P(x)p.{y) dx = P’(y). 

We will introduce an auxiliary function X(y) as follows 

in order to make maximization of (2) easier than a direct 
method. 


Theorem I: Assume that the dissemination charac- 
teristic equation: 


| e.XW dy = [ply logp.y) dy A) 


has a solution X(y). Then the dissemination: 


Hy) = — |f PG)p.(y) log pW) dx dy 6) 


is expressed simply in the following form: 


Hy) = — | PY)X@) ay. 6) 


Integrate the Fredholm’s integral equation of the first 


kind (4) with x after multiplying it by P(x) and then, 
since its right side is just —H,(y), we have 


Hy) = = |f Pop.) XW) dy de. 


If the order of the integrations in the right side of the | 
above is interchangeable, we have finally (6), using the © 
relation (3). Then the transmission rate takes the following | 


simpler form: 


Ree 


= | Py) log PY) dy + [ PDX) dy. 


(3) 


h957 


t should be noted that X(y) is a known function which 
hn be determined only from p,(y). 

| Therefore, the capacity for this channel could be 
ibtained if P(#) obtained from (3) corresponding to 
’(y) for which (7) is maximized under the condition 


Hi 


| 2 oe / ae (x) EN / a Hie “(w)) d 


Muroga: On the Capacity of a Noisy Continuous Channel 


| ee (xw) gs | ate 


45 


while yu is determined from {[P’(y) dy = 
have 


1. From (11) we 


NP’(y) dy = 1 does not take a negative value over the 
Hefined interval of x. In the case of a noisy continuous 
annel, however, we may have to consider additional 
estrictive conditions, for example like fixing the average 
ransmitter power or the peak transmitter power at a 
onstant value and like restriction to keep the average 
Halue of the differences between the signal values of the 
ransmitter and of the receiver within a certain range. 
et us express it in general as follows: 


/ Roe PCP Gidea] consent: ©) 


Maximization of R under this restrictive condition 
laay be done conveniently by the Lagrange’s method, 
(0 we need maximize only the following: 


= — | Pry dog Py dy + [ PXW Ay 


' 


+ | uP'y) dy + | Ste, , P@), P') ae ay 


+ [wtf P@p. ax — PW) ay, (9) 


here uw, A and r(y) are coefficients to be determined. 
‘rst the calculus of variation of (9) with P’(y) gives 


log P’y) —1+ Xy +2 


+2 spy de — 1) = 0 (10) 
ind that of (9) with P(x) gives 

ae Hla / r(y)p.(y) dy = 0. (11) 
10) gives P’(y), that is, 

J exp (X(y) + | amin DeLen 

(a [ew x@ +a] 55 Tae da — r(y)) dy, e 


[ ex (xw + fee 


[ wp. dy = -» [ spa (13) 
Insertion of (12) into (7) gives 
7(y) — ry) } dy 
DOF 


= “W)) dy 


where \ and 7(y) should be determined from (3), (8), and 
(13). If P(x) which can be calculated from thus obtained 
P'(y) satisfies the required condition to be non-negative 
over the interval of a, C of (14) gives the capacity itself. 

The Fredholm’s integral equation of the first kind (13) 
cannot be solved explicitly in general. However, under a 
certain condition we can solve it. Here we will apply the 
Kameda’s method on it.’ Let 7(y) and p,(y) be expanded 
as follows: 


Ty) = Copoly) + Gigily) + --:, 


where ¢,(y)’s (n = 0, 1, 2, ---) are a system of the 
normalized orthogonal functions, for example 


(15) 


f a x 
go(t) = Rae! ee) 
ue mevad x 
Od ie ) eI 5 ue) 
a \ ds x 
ee) = zo (5) ge (-5) 
while the kernel p,(y) is expanded as follows: 
DAY) = Xolx)eoly) + xi@)erly) + xolx)eoly) + -->, (7) 


where coefficients, xo(v), x:(%), --- are known functions. 
From a property of the normalized orthogonal functions, 
insertion of (15) and (17) into (13) gives 
=r fer dy = Donal. (18) 
Then our problem is reduced to the determination of 
coefficients, ¢), ¢,, C2, *:* Im an expansion of the left 
side of (18) with a system of functions x,(x) (n = 0, 1, 2, 
--), However xo(v), x:(a), ---, ++: are not always 
orthogonal to each other, so we can not expand it in 
general. But as a special case, if they are linearly in- 


OD, Kameda, ‘““A general method for solving linear integral 
equations,” Proc. Phys. Math. Soc. Japan, vol. 10, Part I, pp. 
231-235; 1927; vol. 11, Part II, pp. 17-27; 1928; vol. 11, Part III, 
pp. 169- 180; 1928. 


46 


dependent of each other, we can find a system of ortho- 
gonal functions 


=> QooXo(x) 


= @10Xo(x) th 1X1 () (19) 


from the above system of functions. Therefore we can 
determine the coefficients ¢, ¢, -+- by comparing the 
right side of (18) with the expansion of the left side of 
(18) with xo(x), x:(v), +--+ which can be obtained by (19) 
after the left side of (18) is expanded with 6(x), 0:(@), °° +. 


The Capacity of a Continuous Channel Where an Average 
Transmitter Power is Limited to a Certain Constant Value 
and p.(y) = ky — =) 

If the noise is additive to the signal and also independent 
of it, we have p,(y) = k(y — x). We encounter many 
practical communication systems which noise of the above 
type disturbs. Now we assume that average power of the 
transmitted signals is fixed at a certain value P. That is, 


[ #P@ dx =P (20) 
will be used instead of (8). Then we have 
Tee © ol ge ON OR 
where 
[ t@ vy = (22) 
We have 
Pi ee (X(y) — 7(y)) (23) 
[ exp &W = 1) ey 
from (12) and 
on = — | ry)p.ly) dy (24) 


from (13). The integration of it over x after multiplying 
it by P(x) and interchanging the order of integrations 
gives 


np = — [Py ay. (25) 
and (14) is reduced to 
C = log [ exp (XW) — +) dy— Pr. 26) 
With p.(y) = k(y — 2) we get 
= [ py) log ply) dy = Hn) (27) 


IRE TRANSACTIONS ON INFORMATION THEORY 


March 


which shows its left side to be equal to a constant value — 
H(n). Then a solution for the dissemination characteristic — 


equation is 


X(y) = —H(n). (28) 


Corollary I: If the noise is additive to the signal and 
also independent of it, the solution X(y) of the dissemi- 


nation characteristic equation is equal to the noise © 


entropy H(n) with the minus sign. 
Therefore we have 


exp (—7(y)) 


py) = 
exp (— r(y)) dy 


| C= -H() +108 f exp (—1@) dy 29) 
[ -@-exp (=r) ay 
: | exp G@) dy 
Now we must determine r(y) from 
=r = f rk — 2) dy, (30) 


which corresponds to (24).° With change of the argu- 


ment, it is reduced to 


—r\r? = if t(a + z)k(z) dz. (31) 


Let us expand k(z) and r(y) into infinite series with a 
system of the Hermite function: 


co 


Gm 


k(z) = AP Sak =m! H,,(2) (32) 
and 
oy) =D bulla), (33) 
where 
[H@ = (pre? ers 


1 = [ 6H. dz 


1 


nl0/ 2Qr 


| LONER ZA AG 


= 


6 A formal solution for (30) could be obtained by means of the 


Fourier integral with some technique. (Cf. C. Titchmarsh, “Intro- | 


duction to the Theory of Fourier Integrals,” Oxford Press, Cam- 


bridge, England, p. 314; 1937.) But here the author is highly indebted 


to Zen’iti Kiyasu whose suggestion on the following method has 
made the solution of (80) more elegant. 


~® 


1957 


Particularly we have a) = {k(z) dz = 1 since H,(z) = 1, 
jnd a, = fzk(z)dz is an average value / of the noise and 
jurthermore we have a = f(z? — 1) k(z)\de = rn? — 1 
where n° is a mean square value of the noise. It should 
pe noted here that k(z) in a,, and r(y) in 6, are not weighed 
vith the same function. Then we have 


ce) k 
a+) = Dr @Os 
k=0 k! 
co ee} ! k 
ta ee ere OD 
k=0 n=0 n! k! 
he right side of (31) may be expanded as follows: 
| (os) fo) ! ke 
[ r(z + 2k(ze) dz = >, [a (m + )l Os (35) 
i k=0 m=0 m!| k! 


Comparison of it with the left side of (31) gives us a set 
bf the equations, 


fo) wie ! 
Ds GU (m + k)! = 6 


ve = (k # 2) sm 
whose set of solutions are 
bb = —A(Zai — ay) 
DO; = Har (37) 
b, = —) 
bs = bs = = 0 
Finally we have 
ry) = —N(2at — a2) Holy) — 2aH(y) + H.)] 
= —)[2ai — a — 2ay+y’ — 1). (38) 


\Calculation of \, P’(y) and C with these gives the following 
heorem, setting y = —A. 

Theorem 2: If the noise is independent of the signal and 
Hamplitude of the received signal is a linear sum of ampli- 
Hides of the noise and the transmitted signal, the capacity 
Hof this continuous channel with an average transmitter 
Hower P is expressed as follows: 


a(n) * log (=) + y[P +n? — (n)’] 


(entropy/freedom) (39) 


| 

here H(n) is the noise entropy for this channel. And the 
probability P’(y) which gives this capacity value to the 
ransmission rate is 


P’G) = */y/n exp (—yly — n)), (40) 
where 
y =P +n? — 3@)’)* (41) 


Muroga: On the Capacity of a Noisy Continuous Channel 


47 


and, 7 is an average value of the noise and n? an average 
of the squared value of the noise. Here validity of this 
theorem requires that P(x) corresponding to the above 
P'(y) is non-negative over the whole defined interval of zx. 
Eq. (39) can be written in bits per second as follows: 


Ose 21] — Hin) + 5 loge (*) 


+ y(P + n® — (nj) loge | 


(bits/second) (42) 
where W is the upper bound of frequency (eps) of the 
channel and the noise entropy H(n) should be calculated 
in bits per degree of freedom. 

Shannon’s famous theorem on the capacity of a con- 
tinuous channel under the disturbance of white noise 
may be obtained as a special case from this general 
theorem as follows. 

Corollary 2: The capacity of a channel of band W eps 
disturbed by white noise of power N when the average 
transmitter power is P is given by 


| ag ae 


C= WW loss N 


(bits/second), (43) 


Proof: From the conditional probability density of the 
white noise 


a, ae BOS wv 
We have 


X(y) = —H(n) = —log, V/ 2reN . 


We have # = 0,n? = Nandy = 3 (P+ NN)". 
Finally, 


Ce loge ~ ~ 
WGN eee ee ee 
fe as \/ PN ( a(P + a 2 
1 x 
P(z) = WEP exp (-25). 


The first equation of the above is expressed in bits per 
degree of freedom for simplicity of calculation, so muti- 
plication of it with 2W gives the statement to be proven. 

Next we will discuss a case of the noise with Rayleigh 
distribution. 

Corollary 3: The capacity of a continuous channel of 
band W cps and of an average transmitter power P under 
the disturbance of the noise of power N with the Rayleigh 
distribution is given by 


1S 


; Dae [fle 
Gs ess — 
VW | to Zz (ie 


IRE TRANSACTIONS ON 


P+wN 
) + (E a | log. | 


bits/second, (45) 
provided that P > N, where 
M0 = Tapa (Se) ne 
Melee Nye ome 
Ee (25), (47) 


Proof: The Rayleigh distribution’ is 
( 


|= exp (—2z/-V/N) dz 


We) dz = | VN 


\O 


aa) 


a) 


where 2 = VN and @¢ —V N)? = N. Then we have 
R= VN,n 2N and y 1/[2(P — N)]. Therefore 


we get (46) as P’(y). From 


H(n) = log. \/ Nese Ae 


(48) 


we have the capacity in entropy per degree of freedom. 
Multiplication of it with 2W gives us the capacity in 
entropy per second. The probability P(«) of (47) cor- 
responding to this P’(y) is obtained after some complex 
calculation of the Fourier integral of (3). However P(x) 
in (47) does not satisfy the requirement as the probability 
density for it, since it is negative for x > P/~/N though 
its absolute value may be very small. If P > 4/ N, we 
may be able to ignore the small difference between the 
above result and a correct one to be obtained. Conse- 
quently we get the above Corollary 3. 


A Continuous CHANNEL EXPRESSED BY A 
Markov Process 


A Markov Process in a General State Space 


The generalized continuous channels where not only 
sample sequences of signals but also noises are expressed 
as Markov processes, could be found easily in many real 
cases. Before discussing it, we will consider the difference 
between the Markov processes with continuous variables 
and those with discrete variables.* 

When a signal value z, at a certain point of time depends 
on a set of signal values at v points of time in the past; 


Ln1) ***, Un—», the sequence of random variables is called 


7J. L. Lawson and G. E. Uhlenbeck, ‘Threshold Signals,” Rad. 
Lab. Ser., McGraw-Hill Book Co., New York, N.Y.; vol. 24, p. 53; 
1950. 

8 J. L. Doob, “Stochastic Processes,” John Wiley and Sons, 
New York, N.Y.; 1953. 


INFORMATION THEORY March 


a v-dimensional Markov process and particularly for 
v = Lasimple Markov process. However since the vector 
process with random variables {4,} where £, = (®-v+15 

- , x,), has the property of a simple Markov process, 
the v-dimensional Markov process can be reduced to 
a simple one. In the case of discrete variables where 
each x, takes N different values, a sequence of random 
vector variables {#,} can be considered a simple Markov 
process where a random variable takes N’ values and 
therefore an idea of a vector variable may not particularly 
be needed in the transformed simple Markov process. 
Moreover there is no essential difference from Shannon’s 
diagrams which express these processes, as we see in the 
Shannon, Weaver, and Muroga.’’” The reduction of 
multiple Markov processes into simple ones can be 
applied even to a case of continuous variables, but the 


situation for the spacial expression is not so easy with 


increase of dimension as the discrete case. 
Let X be a space of points € and let Fy be a Borel field 


of X sets. A function p(é, A) of &eX and A Fx is called a 


stochastic transition function if it has the following 


properties. Particularly when it specifies a property of a 
it will be called a channel transition function. — 


channel, 
That is, 
measure 
function 


1) p(é, A) for fixed & determines a probability 
in A; 2) p(é, A) for fixed A determines a & 
measurable with respect to the field Fx. 


The transition probability after n Nyquist intervals can | 


be calculated as follows: 


DG; A= pew) 
pr(e, A) = | pCa, Adv, ar). 


And the probability that signal value is in A at the n-th 
sample point is given by 


[ °G, APD, (50) 


which is P(A) for n 1. If it is independent of n, the 
process is strictly stationary and P(A) is called a stationary 
absolute probability distribution. 

Under a certain condition (Doeblin), many properties 
of a Markov process in a general state space are obtained 


(49) 


with mathematical rigor. Roughly speaking, the Doeblin’s | 


condition is uniformity in € on the smallness of p(é, A) 
for small A. This imposes a rather weak restriction on a 
property of channels. But most of the models of channels 
which have physical significance may satisfy it. 

Under Doeblin’s condition, 


Sore: 
jim — 
no n 


2p, E) = a, B) (51) 
exists uniformly in & and EH, where EF eFy. A set E is 
called a consequent set if, for some &, p” (&, EZ) = 1 
for all n and then /'is called a consequent of £. A set which 
is a consequent of every one of its points is called an invari- 


X 


N957 


int set. Such a set is either empty or has a finite valued 
ineasure g(H) > e where e is a positive number. Like a 
lliscrete case, decomposition of X is possible. If F is a set 
jor which ; 

lim p'(E, F) = 0, eX, 


noo 


he set F is called a transient set. Assume a decomposition 
pf X into disjunct invariant sets #,, /,, --- and a transient 
pet FP = X — UF ,. And if there is a probability measure 
mt of sets H eFx corresponding to each H, such that 


n 


lim ~ op, B) = rE) 


nro m= 


(E.) =e and feH,, 


then the £,’s is called ergodic sets. Furthermore /’, may 
pe decomposed into cyclically moving subsets. 

_ For simplicity we assume that there is only a single 
rgodic set which does not contain any cyclically moving 
subsets. Then the limit in (51) exists in an ordinary sense 
rather than Cesaro’s and defines a stationary absolute 
probability distribution ,r(H) independent of & This is 
determined from (50): 

| 


P(A) = | pe, A)PCA®). 


The Capacity of a Continuous Channel Expressed by a 
Simple Markov Process 


Assume that when two signal values at the transmitter 
are x and w’ respectively at two successive points of time 
hich are one Nyquist interval apart, y and y’ appear 
t the receiver at the two corresponding points. And we 
ili define as a channel transition density function the 
conditional probability density p,.,(y’) that the received 


it with y’ is the channel transition function. This specifies 
2 statistical property of this channel. 

Noise which is essentially a Markov process may be 
encountered often. For example, if noise is independent 
iof signal and additive to it, p,,’,(y’) is a function of only 
ae andy — 4’; Peay) = ka — 2’, y — y'). A signal 
ialso can be a Markov process, for example, like a radio 
ireceiver with ave in which a received amplitude y at some 
point of time varies the amplification gain of the receiver so 
ithat the signal at the next point may be affected, bringing 
ly’ as a received amplitude. Now consider ergodic sets at 
both transmitter and receiver, in similar fashion to a 
discrete case.’ If P,(x’) be the transition probability den- 
sity, when the amplitude at the first of two successive 
jpoints of time at the transmitter is x, the amplitude at the 
jeecond point is xv’, and it specifies a property of this 
jergodic set at the transmitter. Then the stationary 
absolute probability density, that x is taken at a certain 
point of time, can be determined from 


I 


| 


Muroga: On the Capacity of a Noisy Continuous Channel 


49 


/ ACD is Ie. (52) 


The similar relation must hold at the receiver, replacing 
simply x and x’ by y and y’, respectively. 

P(x, y) is the joint probability density that the ampli- 
tudes .at the transmitter and the receiver are respectively 
x and y at a certain point of time. P(y, y’) is the joint 
probability density that the amplitudes at two successive 
points of time at the receiver are respectively y and y’. 
Then from the property of the channel transition density 
function we have 


[[ Po, DPCP!) de da! = PY, y) (63) 


and 
[f PO. DP.e ply!) de dy = PO’, y). (64) 
Now let us introduce an auxilliary function X(y, y’). 


Theorem 8: A solution for the following characteristic 
equation 


tt X(y, y)P(x, Y)Pexyly’) dy dy’ 


= |f Pe, Dee Sy) log peel) dy dy’ (65) 
is X(y, y’). Then the dissemination 
Ha) = — fff Po, DP.O ply) 
“10g Dex y(y") da da’ dy dy’ 
can be simplified as follows: 
Hy) = —[f Xu, PW, ¥) ay dy’. 66) 


A proof is similar to that of Theorem I. 

Now, to find a capacity for this channel we will try to 
get the maximum of the following transmission rate with 
average transmitter power fixed: 


POY ; 
-|f Ply, y’) log y¥) dy dy 
POLY dy: 


+ ih i XY, JPY) ay ile Kae) 


Take this average transmitter power as P, that is; 
il (a? + 2’)P(x)P.(x’) dx dx’ = P (58) 
or 


i, (2? + x) PAa’)P(a, y) dx dx’ dy =P. (59) 


50 IRE TRANSACTIONS ON INFORMATION THEORY 


For convenience, the latter expression will be used for the 
following calculations. 

Maximization of the transmission rate (57) subject to a 
constraint could be done by Lagrange’s method. That is, 
we need to maximize only the following: 


= I Ply, y’) log age dy dy’ 
Pty; y’) dy’ 


U = 


+ [f XW, v)PW, v) dy ay/ 
+ if to | EU) da [eu y) iy’ | dy 


+ [fff oo, 2 XW, WP] Dp.) 
— PR, Y)Pex ly’) log Dearsly’)] da da! dy dy’ 


+ ff, “Lif P(e, y) Pe! )Prerly’) de dy 
IEE, v) | di’ dy’ 
+ ff u,v) | |] Po, DPQ’. Ay!) dx de! 
— PY, v | dy dy’ 
+ ff r@Pa’) ae! dx +7 ff Py, ¥") dy ay! 


+ fff xe? + 2 Pe VPC, w) de da” dy. (60) 


The third term in the above expression may be derived 
from other relations but is included here for convenience 
of calculation. Also 


[ P.@) dca 
(61) 
/ Pty, y’) dy dy’ = 1 
are included in the above, as properties of the probability 


functions. Calculating the variation of P(y, y’) with U 
gives 


PGSY) 
Pty, y’) dy’ 


—log 


ea) tn) nl) 


= way) ry = ).0e G2) 


and with X(y, y’) gives 
OP t ) am 


ke sa): (63) 


Also, calculus of variations of it with P,(x’) and P(z, y) 
give respectively 


March 


I v(a’, y P(x, Y)Prxyly’) dy dy’ 


+ ff uy, YPC, Deeely) dy dy! + x) 


+ | Na? + PC, ) dy = 0 (64) 
and 
[f ae, 29[ xe, wpe) 
= Paz vy’) log park | dex’ dy’ 
+ ff, P.O Pearly’) ae! dy! = Hex, 9) 
Sr / / BY, Y)PA)Dee' Ay’) da’ dy’ 
+ | r@? + 2 )P.(o’) da’ = 0. (65) 


Comparison of integration of (62) multiplied by P(y, y’) — 
over y and y’, and that of (65) multiplied by P(a, y) over — 
x and y, gives y = — AP — C. Insertion of it into (62) 
leads 


POY) 


= exp {-—C + X(y, y’) + nly) 
[ Pu, y’) dy’ 


— n(y’) — uly, y') — dP}, 6m 
whose integration over y’ gives 
e 1 w fe 
-exp {X(y, y’) — uly, y’) — AP} dy’, (67) 
where exp(—C) = W. This is a homogeneous integral 


equation. We must find eigenvalues of W for it to have 
nonzero solutions. If the smallest negative log W could 
be found among eigenvalues and then all of probability 
densities satisfy the required conditions, — log W gives 
the capacity for this channel. When the kernel of (67) is 
of convolution type, e "” can be shown to have a fairly 
simple form under a certain condition. 

Set Fredholm’s determinant, that is, the following 
equation which may now be called a capacity equation, 
to be zero: 


|v 


DW) = 1-5, | KG, 8) ae, 


dé, dé, 5) 


Wa K(&, & Keys 
mle. (, &) 68) 


K(&, &) K(é2, £2) 


1907 


\ 


where Ky, y’) is the kernel of (67). In general, the eigen- 
values for W can be obtained from it. When W, is a 
solution for D(W) = 0, an eignfunction can be obtained 
s a solution, that is, 


exp (—n(y)) = Dly, yi, Wo), (69) 


here 


Wy; W) = WKY, y’) 
) aes Ky, &) 
Ké, 9’) K(&, &) 
Ky, y’) Ky, &) Ky, &) 


Or r ff K(&, y ) K¢é,, é,) Ké,, £5) dé, dé, She tae: 
K(é, y’) K(é&, £,) K(é, £5) (70) 


pnd moreover yj is chosen so as not to make D(y, y/, Wo) 
ero identically. If W, is a m-tuple solution of D(W) = 0, 
e have m number of solutions and then the most general 
olution can be expressed as a linear combination of them. 

Comparison of the integration of a product of P,(x’) and 
(64) over y’, and the integration of a product of P(z, y) 
nd (65) over y, leads us to 


(a) = —[ (0, )PC, ») ay, (71) 
0 7(#) vanishes from (64). 

| Summarizing the above— 

_ Theorem 4: Obtain P(y, y’) from the eigenvalues W, 
and the eigenfunction D(y, ys, Wo) of the homogeneous 
mlegral equation (67) from the capacity equation (68), 


had solve the following equations; 


I/ wy, y')P(x, y)Pee ly’) dy dy’ 
| = -|f P(x, Y)Dexr Ly’) ve’, y’) 


— v(x, 9)} dy dy’ —» [ @* +e )P, aay (72) 


Muroga: On the Capacity of a Noisy Continuous Channel ail 


i PAD) Design) 10S Peers) 
— XY, Y')Paely’)] dx’ dy’ 


+ I | We YP AL Deal’) dae! dy — vay) 


+ ff vDP.O Dee sky!) de! dy’ 


be i NE Ply Pie a 0 (73) 
BY i= i P(x, y)Px’)pre fy’) dz da’ — (74) 
Pa’ y) = ff PO, dP.e pe Ay) de dy (78) 

i i XY, YP, Y)Pze Ly’) dy dy’ 
= ff Po, Dpeky’) 108 pee ly) dy dy’. (7) 


Then, —log W, gives the capacity for this noisy channel 
for the smallest value of the eigenvalue W,, for which the 
probabilities density from the above five equations satisfy 
the requirement for ergodicity. 


ACKNOWLEDGMENT 


The author is very grateful to Zen’iti Kiyasu, Chief 
of his section, for his advice in refining the material 
herein, and to Asst. Prof. Namio Honda for the stimulating 
inspiration he provided in initiating this work. 

The author is also very grateful to Prof. Robert M. 
Fano for giving him a deeper appreciation of informa- 
tion theory at the Massachusetts Institute of Tech- 
nology. 


Cyd 


52 IRE TRANSACTIONS ON INFORMATION THEORY 


March 


Merit Criteria for Communication Systems: 


A. HAUPTSCHEIN{ AND L. 8. SCHWARTZ7 


Summary—Merit criteria are presented for the transmission 
of information through noisy channels. The operational problem 
is formulated in such a way as to permit the minimum communi- 
cation cost required to transmit a message over a communi- 
cation system with a specified degree of reliability for given noise 
and loading (traffic density) conditions to be determined. The 
better system transmits the information at less cost. Cost values 
are shown to be functions of the basic merit parameters power, 
bandwidth and time, and the operational loading conditions. The 
formulation is general and can be applied to the evaluation of any 
modulation or coding system. Comparative evaluation of systems 
should result in performance indices which would permit judicious 
choice of systems for use in various operational situations. 


INTRODUCTION 


HIS PAPER formulates cost criteria for evaluating 
al communication systems according to their ability 

to transmit information through noise. The paper 
uses entropy concepts of information theory and, in 
addition, discusses cost factors in relation to message 
density in operational situations. The cost criteria are 
designed to show that the better communication system 
has a given performance for less cost or a better per- 
formance for the same cost. 

Information theory, in assigning a numerical measure 
to the information content of messages, provides an 
excellent starting point for judging system performance. 
In asituation requiring the transmission of a given amount 
of information, the parameters, power (P), bandwidth 
(W), and time (7’), are interrelated through the expression 
of information rate. Moreover, information rate may be 
expressed in terms of per-unit equivocation which is a 
measure of reliability, since it expresses the residual un- 
certainty in the received information. Thus, a relation- 
ship exists between the P, W, and T parameters and 
reliability. In a manner to be discussed below, cost 
relations can be formulated such that the total cost of 
communication can be made a function of power, bend- 
width, and time. 

In evaluating a communication system the constraints 
which exist between P, W, and T for that system -are 
known. The constraints permit a cost relation to be 
formulated, so that the values of the parameters which 
make the cost a minimum can be found. Having found 
the optimum values of the parameters for any one system 
of communication, one can then find the minimum cost 
of operating that system. In this manner, minimum cost 
for other systems can be determined and comparisons 
between systems made. That system which can transfer 
a given message with specified reliability at least cost 
is optimum. 


* Manuscript received by the PGIT, May 3, 1956. This work 
was sponsored under Contract No AF19(604)—1049. 
+ College of Eng., New York Univ., New York 53, N.Y. 


Cost ForMULATION 


In establishing communication cost relationships, one 
must include initial construction and installation costs, 
maintenance costs, and operational costs. Initial and 
maintenance costs are fixed costs and do not vary sig- 
nificantly with the operational situation in which they ~ 
are used. On the other hand, operational costs, as defined | 


here, are functions of the operational conditions. The 4 


fixed costs have been considered in engineering economy 
studies’ and will not be treated here. It will be the main } 
purpose of this paper to introduce the concept of } 
operational cost and to show its effect on the optimization — 
of communication systems. Communication cost will be - 
defined as the sum of initial, maintenance, and operational — 
costs. 

A necessary first step in a cost formulation is to make a 
statement of preferences. In terms of cost, this means — 
that relative weightings are assigned to the basic param- | 
eters of the communication problem. Relative weight- | 
ings enter because, depending on the operational situation, — 
the basic parameters may have different relative im- 
portance. Essential to the problem of cost estimation is | 
the specification of the basic parameters which enter into | 
the cost estimate. In communication systems, three 
parameters seem to be of fundamental importance and — 
may be considered merit parameters. These are signal 
power, signal bandwidth, and operational time. These 
parameters are basic because: 1) They define information 
rate, 2) they determine initial and maintenance costs, 
and 3) they are essential to an operational cost formu- 
lation. Time, as used in this context, has two aspects. 
One aspect refers to the time to transmit a message over a 
channel; this is a function of the information, the in- 
formation rate, and the coding delay. The other aspect 
refers to the waiting time and is a function of the per- 
centage of full loading of the facilities and the number of 
communication channels. It is concerned with traffie 
density and problems of congestion such as arise in tele- 
phone operation and in aircraft control at large airports. 
Transmission of information results in information time 
delay and loading results in waiting time delay. The sum 
of the time delays is called the operational time delay. 

Initial cost is a function of power, because power output 
determines the size and weight of a transmitter; it is a_ 
function of bandwidth, because bandwidth determines 
the size, character, and number of components; and it is_ 
a function of time, because time, when utilized in a trade-— 
off with power and bandwidth, implies the use of coders, — 
decoders, and noise filters. Maintenance cost is a function 


1K. L. Grant, “Principles of Engineering Economy,” The Ronal 
Press Co., New York, N.Y. 3rd ed.,; 1950 yy e Ronald : 


; all three merit parameters, because they determine 
uipment complexity. 

| Operational cost” is dependent on an ensemble of 
pssible future events. For example, power and time 
+quirements determine energy demands which affect the 
eight and volume of equipment and, therefore, in the 
hse of airborne operations, the pay load. Also, if the 
saximum allowable time to transmit the location of a 
igh speed enemy aircraft is exceeded, the result may be 
wtastrophic, and, consequently, the cost formulation 
wust reflect this fact. 

To aid in further discussion, a set of channel cost 
Heures are assigned —,Cp, ,C yw, and ,Cr, called operational 
lower cost, bandwidth cost, and time cost respectively 
der conditions of v loading (density of traffic). In this 
iscussion the symbol »v refers to light (1), medium (m), 
r heavy (h) loading. It can, of course, refer to any degree 
f loading. The total operational cost »* on a per channel 
lasis, under the various degrees of loading, can be ex- 
ressed as the sum of the individual costs. 


iG = iCp Si iCy ate iCp (1) 
me a ml P aE nl w 35 Ce (2) 
nC oe iC p ale AC w te Cp. (3) 


‘he total average operational cost can be obtained from 
re relation 


Co = pie + plm),.C + plh),C (4) 


here p(v) is the probability of the condition of » loading’ 
‘o this must be added. initial and maintenance costs’ 
‘hich are functions of the merit parameters P, W, and T 
t independent of the degree of loading, and are symbol- 
fed respectively as C;(P, W, T) and Cy,(P, W, T). The 


btal average communication cost C, is then given by 


| Cad et AOAP, WR) + Cy (Po WT) (5) 


The form of the cost relationships must be obtained 
rom careful analysis of an operational situation. In 
eneral they may be represented by 


Cp = Coo f(a,P:) (6) 
Cw = Cryo g(B,W.) (7) 
Cr = Cro h(y,To) (8) 


P.—average signal power. 

W .—signal bandwidth. 

T,—operational time; and is equal to the 
sum of information time 7’; [trans- 
mission time plus encoding (7’,) and 


decoding (7) time] and _ waiting 
time Ty. 
a,, By, Y,—Weighting constants which are de- 


termined from the operational situation. 
| Cpo, Cwo, Cro—unit conversion or normalizing factors. 


2Since cost might refer to the dollars and cents value of an 
\beration or to the manpower involved, the units of cost will have 
britrary significance in this paper. 


Hauptschein and Schwartz: Merit Criteria for Communication Systems 


53 


For reasons previously given, initial cost can be divided 
into component costs as 


CAE, WTC yes C ra) Cr ties) a 


Maintenance cost is probably a more complicated function 
Olea A ander: 

These cost equations must indicate, as realistically as 
possible, the exchange relationship existing between cost 
and the merit parameters. For example, cost must increase 
monotonically with increases in the merit parameters. 
The formulation should reflect a probable nonlinear rise 
of cost with P,, W,, and 7, since there is usually some 
value of P,, W,, and 7, beyond which it is extremely 
costly to operate because of equipment size, complexity, 
or available time. Weighting constants can be chosen to 
reflect this effect for different operational situations and 
degrees of loading. 

The cost formulation should indicate that the merit 
parameters and unit increases in them are more costly 
for heavy loading than for light loading conditions. The 
reason for this is that for a given communication facility 
there is usually an upper limit to the available supply of 
power and the total assigned frequency band. For example, 
the total frequency band assigned for long wave com- 
munication may be in a frequency range 15 ke to 1500 ke. 
This is a fixed requirement that cannot be changed. If 
loading is light, meaning that relatively few channels in 
the band are used, individual channel bandwidths may be 
wide to overcome noise. Also, because of relatively small 
amount of traffic for the light loading condition, re- 
strictions on operational time are not severe. On the other 
hand, if loading is heavy, meaning that all or nearly all 
of the available channels in the band are called in to use, 
an attempt to combat noise by use of wide bandwidth 
channels may be impermissable because of the limitation 
on over-all frequency band for the facility. 

These statements suggest that for light loading, mini- 
mum power is more desirable than minimum bandwidth 
and minimum time. The implication of this statement is 
that for light loading wide bandwidth systems, redundancy 
coding, or integration should be used to combat noise. 
For heavy loading narrow bandwidth systems should be 
used because a premium is placed on bandwidth. More- 
over, because waiting time and, therefore, operational 
time are increased in heavy loading, systems employing 
integration or lengthy coding techniques which increase 
information time should be avoided. 

Considerations such as these must be employed in 
obtaining a cost formulation, and the process is one which 
rightly falls into the realm of Operations Research. The 
necessary values of parameters can be obtained from a 
study of past or current operations and they can be 
introduced into a theoretical model of the situation. This 
model, in our case, would be the proposed cost formulation. 
Once the model has been found to be a valid representation 
of the system under study then the values of the param- 
eters may be altered in order to predict the effects of 


54 IRE TRANSACTIONS ON INFORMATION THEORY 


adopting new courses of action. This requires the accumula- 
tion of sufficient pertinent data. 


APPLICATION OF Merit CRITERIA 


To illustrate some of the ideas just discussed, consider 
the problem of transmitting a message containing J bits 
of information in the presence of additive white Gaussian 
noise with a given reliability or per-unit equivocation. 
First, it is desired to determine the constraining relation 
among the merit parameters, and second, for an assumed 
operational situation, it is desired to determine the 
minimum cost condition. 


Determination of Constraint Equation 


It is assumed we have a message containing frequency 
components no greater than W,, and with significant 
amplitude for a length of time 7’,,,. Outside this interval 
the amplitude is presumed small, so that most of the 
energy is confined within 7,,, and W,,. If 


Biles Vee ale 


this message is determined to a high degree of accuracy 
by its values at 27,,W,, sampling points spaced $V, 
seconds apart. If this message is quantized into B levels, 
it will contain at most 


I = 2W,,T,, log, B bits of information. (10) 


Suppose it is desired to transmit this message over a 
symmetric binary (video) channel. This would require 
sending groups of 2'°**” binary pulses. If is the number 
of pulses per group, 2nW,, pulses are transmitted per 
second. This requires a signal bandwidth W, > nW,,,. 

In terms of per unit equivocation H,* the channel 
information rate 1s 


Ri = H@) = Aya) 
Las H,(2) 
=) (| 1 A(z) | 


(11) 


= entropy before transmission, bits/symbol 


= 1 — E bits per symbol 
where H(z) 


1 for binary channel. 


H,(x) = equivocation, bits/symbol* 
Seah 10g pis (los p) loge), meen) 
p = probability of error 
= #1 — &(V(P,/20°)] (13) 
and 
@(a) = error integral 
2 ip Wee 
ee ee eas 14 
ae dl, (14) 
o = average noise power. 


3R. M. Fano, “The Transmission of Information II,” Res. 
Lab. of Electronics, M.I.T., Cambridge, Mass., Tech. Rep. No. 149; 
February, 1950. 
4If the two levels of signal pulses are equiprobable. 


| 
March 


From (11) through (13) a relationship can be obtained 
between H and the merit parameter P,. | 
As is often the case, this relationship is cumbersome to 
work with and an approximate expression is desirable. 
For binary pem Jelonek® gives as an approcinall 


expression for # 
Eee ae (15) 

P= —2Weulnk, (16) 

The rate at which information enters the channel per 


second can be determined from 


i 
| 


R, = 2W,H(2) : 


(17) 


2W, bits per second. 


The transmission time 7, is then obtained from (10) 


and (17) and is 
Wile Oe | 
Ps 7s R, a W, (18) 

or 
CW WoL Ot b. (19)} 


In this example, encoding and decoding times® are assumed 
negligible compared to 7, so that information time 
T; = T,. If, in addition, a condition of light loading! 
exists and waiting time T,, is zero, 7) = T;. 

Multiplying (16) and (19) determines a constraining 
relation for the merit parameters in terms of per-unit 
equivocation, ) 


P,T.W, = —2.170° W,,T, log, Bin. (20% 


Cost Estimation [ 


A prerequisite for a cost estimate is an operational, 
situation. For illustrative purposes the following situation 
is assumed. Suppose there are N channels capable ofi 
operating from a total available power of P, watts, and 
a total bandwidth of W, cycles per second. Also, let T 7 
represent the total amount of operating time. Suppose 
that an analysis of similar past operating conditions 
indicates that an exponential relationship holds between. 
the operational cost and the merit parameters. Then, 
(6), (7), and (8) become | 


Cp = Cro OES (21) 
Cr = Cwo reads (22) 
Cr = Cro Crete (23) 


The weighting constants a,, 8, and y,, can be determined 
by specifying threshold percentages ,z,, ,x, and ,a,. 


PUD) 


5 W. Jackson, “Communication Theory,” I 
New York, N.Y. p. 52; 1953. oy Reade ae 
6 In this problem, by encoding time is meant the time to convert 
the original message into binary signal pulses; and vice versa for 
decoding time. In general, if coded systems are used, the time 
involved in coding the signal must be determined, ; 


a7 


jere, ,, represents that fraction of the total available 
mount of merit parameter («) above which the opera- 
onal cost per unit increase in the merit parameters grows 
) an excessive rate. From a careful analysis of the 
perational or tactical situation it should be possible to 
ptermine with suitable accuracy the value of ,z,. For 
xample the following values might be appropriate: 


10, = 0.25; ype = O10: nv = 0.02 
1%, = 0.30; mv» = 0.10; rv, = 0.01 
1%, = 0.30; nt, = 0.10; pep = 0.0L 


hd time in heavy loading and that of power in light 
jading. To determine a,, 6,, and y,, the cost of ,2,Pr 
Hatts, ,v,,W > cycles per second and ,2,7', seconds must 
= specified. If this cost is e”, then 


k 


a, = eps (24) 
k 

Be = LnW ro (25) 
k 

Ve oie (26) 


_ fact to note is that in the relative comparison of systems 
be value of k may be removed by normalization. Also, 
‘rr mathematical convenience, different degrees of loading 
an be simulated by suitably varying the information to 
® transmitted. This will maintain the form of the con- 
wraining relation (20). 

|For completion, further assume that from a study of 
he equipment, the initial cost is found to be proportional 
b P?, W, and (T, + T,)”*, and that maintenance cost 
| proportional to (P,, W,, T;). Substitution into 
i(P, W, T) and Cy(P, W, T) gives the fixed cost asso- 
Hated with the analysis. 


ost Minimization 


The total average cost C, is obtained by substituting 
b4), (25), and (26) into (21), (22), and (23) and the 
*sult into (5). It remains to determine the distribution 
' the merit parameters which minimize this cost. This is 
padily done by employing Lagrange’s method of un- 
»termined multipliers. That is, form 


(CA XP ns) (27) 


jhese values reflect the relative importance of bandwidth ° 


Hauptschein and Schwartz: Merit Criteria for Communication Systems 55 


where \ is the undetermined multiplier and then solve 
simultaneously (28), (29), (30), and (20). 


On aa bes 
oP, G, “a AP, TW.) = 0 (28) 
a (C, Shel es :ToW 1) = 0 (29) 
0 7 
aT. (Ce = APT r= 0 (30) 
where the constraint equation, (20) is 
PTW, = =2170 W,,7;, log, Bin & (20) 


Following this procedure, the minimum cost behavior 
of a system with reliability # may be determined. In the 
comparison of systems for a fixed H, an equation like (20) 
must be determined for the different systems and the 
above minimization procedure carried out. 


CoNCLUSION 


In this paper, merit criteria are presented for the 
transmission of information through noisy channels. 
The operational problem is formulated in such a way as 
to permit the minimum cost required to transmit a 
message over a communication system with specified 
reliability for given noise and loading conditions to be 
determined. Per-unit equivocation is used as a measure 
of reliability. The better system transmits the information 
at less cost. Cost is shown to be a function of the merit 
parameters, power, bandwidth, and time, and to vary 
with the operational situation. The formulation is general, 
requiring the determination of a constraining relation 
among the merit parameters only. Comparative evaluation 
of various modulation or coding schemes by the above 
method would result in performance indexes, permitting 
the judicious choise of systems for use in various opera- 
tional situations. 


ACKNOWLEDGMENT 


The authors are grateful for many helpful suggestions 
from Dr. E. W. Sampson and C. F. Hobbs, of the Com- 
munication Laboratory, Air Force Cambridge Research 
Center and the Air Research and Development Com- 
mand, Cambridge, Mass., and from F. Bloom of New 
York University, New York, N.Y. 


56 


First-Order Markov Process Representation of 
Binary Radar Data Sequences’ 


GEORGE C. SPONSLERT 


Summary—Study of radar detection-trial data sequences has 
indicated the existence of interscan correlation. The theory of simple 
or first-order Markov chains is here applied to characterize the 
statistics of such sequences of correlated binary data consisting 
of detections (1’s) and nondetections (0’s) of a tracked target 
upon successive radar scans. Both stationary discrete and non- 
stationary continuous parameter processes are considered, for 
which relations are derived between four transition probabilities. 
p:,; and the absolute detection probability, 8, and the so-called 
blip-scan ratio. 

The discrete parameter, first-order Markov chain theory, 
presented first, is extended to the case wherein the blip-scan ratio 
may be expressed as a function of time. It is possible to employ 
the resulting nonstationary, continuous parameter solution to 
simulate radar data for aircraft flights of arbitrary patterns. Certain 
restrictions upon the admissible class of blip-scan functions are 
presented. In the case of the continuous parameter first-order 
process, the scan-to-scan correlation coefficient is shown to be 
restricted to positive values. An application is made to an auto- 
matic initiation problem. 


INTRODUCTION 


EPRESENTATION of radar data for purposes 
R of theoretical study, simulation, or to aid in the 

design of automatic data-processing equipment, 
requires a mathematical model which will satisfactorily 
represent the statistics of such data. In this paper we 
shall develop such a representation for binary radar data, 
that is, for sequences of 1’s and 0’s representing detections 
and nondetections respectively, which are the output of 
some decision-making mechanism which decides whether 
or not a target is considered to be detected on each 
successive scan. The formal mathematical treatment of 
the problem in Section I is an application and extension 
of the theory of stationary, discrete parameter, two-state 
Markov chains; it is assumed ab initio that the radar 
statistics may indeed be represented by such first-order 
processes. In Section II, the theory is extended to a 
nonstationary, continuous parameter representation. 

In the concepts to be developed, scan-to-scan, not 
pulse-to-pulse, relationships are considered, in which a 
search radar is presumed to be tracking a particular 
target. Each scan of the radar is a detection trial, in the 
statistical sense, resulting in either a detection or non- 
detection. This paper is concerned with the representa- 
tion of the scan-to-scan, binary, detection-decision data 
of a search radar operating in a ‘‘track-while-scan”’ mode. 

Study of radar data sequences at a number of different 
laboratories has indicated the existence of interscan 


* Manuscript received by the PGIT, July 12, 1956. The research 
in this document was supported jointly by the U.S. Army, Navy, and 
Air Force under contract with the Mass. Inst. of Tech. 

+ Lincoln Lab., M.1I.T., Lexington, Mass. 


IRE TRANSACTIONS ON INFORMATION THEORY 


March 


correlation. To the author’s knowledge, the possible 
existence of scan-to-scan correlation was first proposed 
in 1953 in a classified report by G. R. Lindsey of the 4 
Canadian Defence Research Board.’ Lindsey’s colleague, © 
P. 8. Olmstead of the Bell Telephone Laboratories, con- 
curred in this proposal and subsequently developed a 
single-parameter statistical model employing a linearizing 
assumption which would represent radar binary data 
sequences for a particular type of aircraft and for a given 
interval of constant detection probability (¢.e., blip-scan_ 
ratio). The work of the present author was initiated in 
an attempt to give a more precise mathematical basis for 
the description of scan-to-scan correlation and to extend 
the theory wherever possible. 


SEcTION I 


It was the conclusion of Lindsey and Olmstead that 
binary radar data should be analyzed on the assumption - 
that the probability of detection of an individual radar © 
echo was explicitly dependent upon the detection (or 
nondetection) of the target echo of the immediately 
preceeding scan but upon no earlier scan. Thus, in the 
mathematical treatment, conditional detection prob- 
abilities were introduced. 1 

As the detection probability upon each scan in this 
newer model depends in turn upon its predecessor, such 
probabilities are seen to form as interrelated chain of 
probabilities. In this section we assume the various 
probabilities are constant (such an assumption is satisfied | 
by circular aircraft flights about the search radar). The | 
statistics which relate the binary data of such a repre- 
sentation are an example of a stationary, discrete param- | 
eter, first-order, or “simple,” two-state Markov chain, the | 
two states being detection or nondetection (colloquially | 
known as “hits” or “‘misses’’) respectively. The statistics — 
are stationary as they are independent of the initial | 
instant of time chosen, and they are discrete with sample | 
period equal to the radar scan period. 

The mathematical formulation of first-order, two-state | 
Markov chains is well known and readily available in ) 
texts on probability theory.’ A table of conditional 
probability elements is considered of the form: 


1G. R. Lindsey, “Signal Correlation Between Successive Scans 
of a G.C.L. Radar,” Defence Research Board-O.R.G.-34; March, 
1953. (Title unclassified.) 

2 P. S. Olmstead, Bell Telephone Labs. Studies Case No. 27420 
January 26, 1955, No. 27682, April 2, 1954. | 

8.W. Feller, “Probability Theory and Its Applications,” John | 
Wiley and Sons, Inc., New York, N.Y., ch. 15; 1950. t 


a 


~ 


Sal 
eae 0 il 
pial = 0 Po.o Porm 1 (1) 
1 Pi,0 Dies 


lhere the states 0 and 1 represent misses and hits, re- 
pectively, and p;,; is the conditional probability that a 
it or miss on any given scan will be followed by a hit or 
iss on the next scan. Thus, p,., is the conditional 
robability that a hit will immediately follow a hit. 
e matrix [p;,;] of (1) is the stochastic matrix of a 
ple, two-state, Markov chain. 

| In order that the matrix (1) represent a physical radar 
tection model, it must be true that either a hit or a miss 
ill follow every hit or miss (7.e., one or the other must 
peur). Hence, it is necessary that 


Poo + Poi = 1 @) 
Diet Qi. = il 


e., the sum of the conditional probabilities of the two 
possible events following a hit or a miss must equal unity 
_) in each case. 

Furthermore, the total probability that a hit or miss 
ill follow an unknown preceding event must be equal 
» 8B or (1 — 8B) respectively, where @ is the unconditional 
r absolute detection probability, commonly known as 
ne blip-scan ratio. @ is thus the “single-look” probability 
at, knowing nothing of the preceding results, the signal 
bturn will be detected on a particular scan. The math- 
matical statement of this definition is that 


Oe oe ee a (3) 
(1 — B)po,1 + Bpia = B 


The last equation states, for example, that the un- 
yynditional probability of detection on a particular scan 
als the sum of the probabilities that the preceding 
ran-state, which is either a miss or hit, will undergo a 
‘ansition to a hit on the particular scan in question. For 
nis reason, the p;,; are also often known as transition 
robabilities. 

When one regards 8 as known, we thus have four 
uations, (2) and (3), in four unknowns (7.¢., Do,0, 
Ae Po.1, and p,,,). Consideration of the system determi- 
lant, A, of this set of equations demonstrates that the 
nk of the linear set is 3 because A = 0, but A = 0 when 


| e equation is eliminated from the set. Hence, as shown 
| 
| 


+! 


n the general theory of linear equations,* a one-parameter 
iystem of solutions exists. Mathematically, the parameter 
s completely arbitrary; a convenient form of the solution 
s given by 


4F. B. Hildebrand, ‘‘“Methods of Applied Mathematics,’ Pren- 
ce-Hall, New York, N.Y., Section 1.7 and 1.8, 1952. 


57 Sponsler: Markov Process Representation of Radar Data 57 


(pi) = a Sage HS | (4) 
OB)” ra tea) 


where a is the arbitrary parameter. This solution may be 
checked by direct insertion in (2) and (3). It is interesting 
to note the linear dependence of the p;,, upon @ which 
justifies the linearizing assumption in Olmstead’s model. 
A more meaningful form of the solution is given by 
computing the scan-to-scan correlation coefficient, p, 
of a stationary, discrete parameter, first-order Markov 
chain. By definition the correlation coefficient between 
successive pairs of a random variable, x, is given by 


i= (a) (Gre va 
o (a) , 


where the bars indicate average values and x; represents 
the value of x at time 7. In our case all terms but one 
equal zero, leaving: 


be 1-1:°6-p, 4 faa (1-6)* 
Pat p) = 0-8) 


From (4) this expression becomes 


pie ets eS) 
Be 


or 


p= 1 va: (5) 


Thus the parameter a is equal to one minus the scan-to- 
scan correlation coefficient. Substituting for a in (4) we 
obtain the more meaningful solution: 


{(1 — 8) + p8} (6 — eB} | ® 
{C98} pe BT er eter) 


If experimental data is available (preferably in large 
quantities) then an experimental stochastic matrix of the 
form given by (1) may be constructed. Let the experi- 
mental frequency of pairs of hits f,,, pairs of misses, foo, 
and pairs of hits and misses, fo, and fy respectively, be 
determined from the radar data, then it follows that 


[pis] = 


p €. foo Poo = fro ; 
ee foo + fos ah fio ae Sy (7) 
“i fos fas 
(Dork =" are eerie 


Rees Ry ne Ey 
Bartlett proves these expressions are actually the maxi- 
mum likelihood estimators of the various transition 
probabilities of a stationary first-order Markov chain.* 
The statistics are ergodic so single-sequence and en- 
semble averages may be interchanged. 

What now is the probability that n scans following an 
initial observation, the second observation will result in 
a hit or a miss, assuming nothing is known as to the 


5M. S. Bartlett, Proc. Camb. Phil. Soc., vol. 47, pp. 86-95; 1951. 


58 IRE TRANSACTIONS ON INFORMATION THEORY 


intervening events except their number? Such a question 
is answered in the theory of Markov chains by use of the 
higher-order transition matrix, [p!".] (which is not to be 
confused with the stochastic matrix of a higher-order 
Markov chain). This transition matrix is the array of 
conditional or transition probabilities that n trials (scans) 
after an initial, observed event, the nth trial will result 
in a particular event (hit or miss). Then 


Ser 
: Be 0 1 
oer Us mares (8) 
1 Geo N Pu 


where p‘”. represents the conditional probability that, if 


the initial observation on the first scan is represented by 
z, the second observation on the nth scan will be 7. 

The mathematical derivation of [p{”,] will not be pre- 
sented here. The general result, derived by Feller, em- 
ploying the stochastic matrix (6) gives:° 


Pl Seog pe a ee 
== 8) 8) 


It should be observed that if n = 1, then [p{"F"] = [p;.;] 
as would be expected. We should alg expect ha as the 
chain of events grows longer, the conditional probability 
elements of [p‘”] should approach the absolute probabili- 
ties, (1 — 8) and (@), in the respective columns. This 
means we would expect that as n grows larger, the par- 
ticular initial event of the chain would have less and less 
influence on the result of the nth trial. If we let 1 © 


in the expression (9) for p;", we obtain the anticipated 
result, namely 


im it = | |, fe eH 
wee (i 8)5 78 
This is the stochastic matrix of independent (7.e., un- 


correlated) events. 

We now turn to the determination of the variance, o’, 
to be associated with the distribution of the number of 
detections in a sequence of scans generated by a simple 
Markov chain process. Let us define a stochastic variable 
N,, to be the number of hits in m scans. It is known that 
the distribution of N,, is asymptotically normal. The 
mean and variance thereof may be determined analytically 
from the mean and variance of another, related random 
variable distribution, namely, the distribution of the 
so-called recurrence times, X,, to be defined hereafter. 
If » and o”* are the mean and variance of this latter 
distribution, then the mean, N,, and variance, o°(N,), 
of the distribution of N,, are given by 


= 
M 
Sis 


(11) 


6 Feller, op. cit., p. 351. 


2 
no 
oa (N,) ed ‘ie ’ 


as n © the approximation becomes better. 

We must, therefore, first determine the mean an 
variance of the distribution of the recurrence times. 

It is shown in the theory of probability that, associated 
with an event, Z, there is a random variable X,, called 
the recurrence time, which is defined to be equal to th 
number of trials following the (k — 1)st occurrence of 
up to and including the kth occurrence. 

Now under certain conditions satisfied in our problem, | 
it is known that 

limp 2 


no i 


(13 


where p‘”. is the 7, jth element of the [p;"}] matrix, an 


u; is the mean value of the recurrence time of the state 7. 
We know from (10), for example, that lim,.. p;"1 = 6, 
and hence that u, = 1/8. 
The explicit, analytical expression for o (X) is kno 


to be,’ 
iz 1] 
bj 


if we are treating the recurrence time of an event the 
conditional probability of which is p;"}, (note, we take 


p;°, = 1). Let the event considered to be that of detection. 
From (9) we see 


o =p; — we + Qn; DD Ee (14) 


n=0 


pin = B+ p™-(1 — 8B). 


Hence 


> [pi =t| = ead = 8) ame 


n=0 Ky n=0 
Se all) 
Hence (14) becomes, (assuming hereafter | p | < 1), 
1 1 2 (1 — B) | 
(X)==-5 5 : 
BON te) 
From (11) and (12) we have the estimates 
N, ~ np (16) 
and | 
o(N,) ~ nf =o 760 — 8) | 


~ naa — | 2], lel=i. (am 


It is interesting to observe at this point, that in the case 
of unconditional or independent events, p,,, = 8, (i.e, 


= 0) and (17) takes the form, 
o (N,; p = 0) = n6 (1 — 8). (18) 


7 Feller, op. cit., p. 362. 


his expression is merely the variance of the binomial 
stribution which is to be expected with independent 
rents, and hence provides an interesting check of the 
plidity of (17) in that case. 

|From the fact that p,, = 8 + p(1 — 8), we see that 
i as a function of 6 will be given by a family of straight 
nes intersecting at the point (1, 1), with p the family 
hrameter. It is known that —1 < p < 1. For positive 
nlue of p, (O < p < 1), as all probabilities are limited to 
e interval zero to one, we see that p,,, > 6, which 
rresponds to the region above the 45° diagonal in Fig. 
| The diagonal itself corresponds to the case of scan 


Pay APs =B+P UB) 


POSITIVE CORRELATION 
REGION 


REGION 
ABSOLUTELY ~—] 
PROHIBITED 
TOP 


0.6 


LOCUS OF INTER— 


0.4 SECTIONS OF = 
B=7ip AND 
P,=B+P(-B) 


0.2 


BLIP-SCAN PROBABILITY, B 


Sl: 


Fig. 1—Permitted values of p;,; for — 1 < p 


ndependence where p = 0. When p is negative, not all 
alues of p,,, are permissible. We know that p,,, and 
h.o, for example, must lie in the interval 0 to 1. Thus we 
bquire, 

| 


Heh 10 se yok fey ioce@) 
i 8 200220: 


| 


: lving for 6, we find 


> me 


(19) 


a0 (20) 
| Now, if 0 < p < 1, the inequalities - (19) ) and (20) 
icy require that | > 6 > 0, of which*, we were aware 
» begin with. However, if —1 < p < 0, then we find 


1 sap 
eer ey, 


(21) 


VST Sponsler: Markov Process Representation of Radar Data 59 


The right side of this inequality is always automatically 
satisfied in any plot of p,,, vs 8. However, in the case of 
negative or anticorrelation, we must also satisfy the left 
side of inequality (21). Thus in the case of anticorrelation, 
~:,, a8 a function of @ is restricted to the region below 
the 45° “aindependent-scan’”’ diagonal bounded by thft 
diagonal, the axis p,,, = 0, and the locus of the points of 
intersection of the various lines p,,, = 8 + p(1 — 6) with 
the vertical lines 6 = 1/(1 — p). This locus of inter- 
sections is plotted in Fig. 1, and the region of anticorre- 
lation is depicted by the broken lines. (Note the degenerate 
point where p = —1 and @ = 3.) 

Anticorrelation means a tendency exists toward 
alternation of hits and misses. Complete anticorre- 
lation (9 = —1) corresponds to sequences of the form 
(01010101 ---). To date, no such anticorrelation has 
been observed in actual radar data, and it is believed 
only the positive correlation region has physical reali- 
zation. In this regard, Olmstead presents an experimental 
plot of p,,, (called x’x in his terminology) vs the blip- 
scan ratio, 8, for a particular aircraft at a particular 
range and altitude. This graph is reproduced as Fig. 2; 


0.8 


p=1/2 
Ge WIT 
} POSITIVE 
0.6 Ae wee CORRELATION 
Cy Me LA 
A 2 B=29 AIRCRAFT 
pele 


we KK oo Lvs !-B] 


4 WS © —6-= OUTGOING SS 

on pice GWG A 
A i VA Qq@6 Ao = INCOMING 
ie Uf, Ci PROPOSED MODEL BASED ON 


SCAN-TO-SCAN ASSOCIATIONN 
Kenora 5 BASED ON NO 


SCAN RS AEKR 


SCAN-T 
SIN. SNS S\ 

\< INDEPENDENY EVENT SS N 
BINOMIAL DISTRIBUTION 

WN \N S SS NN S S NN S 


1) 0.2 0.4 0.6 0.8 | 
BLIP-SCAN RATIO B 


Yn 


Fig. 2—pi,1 vs 8 from Olmstead. 


we note the linear dependence of p,,, on 6 and that we 
have positive correlation. Olmstead’s model corresponds 
to a scan-to-scan correlation coefficient of 4 

In this first section we have presented the statistics of 
a stationary, discrete parameter, first-order, two-state 
Markov chain process as applied to the representation 
of radar detection-trial data sequences. We have derived 
expressions for the transition probabilities and various 
related inequalities. We have also derived expressions 
for the mean and variance of the distribution of hits, 
N,,. We have not given any test to determine whether or 
not a simple Markov chain process is adequate to describe 
a particular set of radar data. Such a test, together with 


60 IRE TRANSACTIONS ON INFORMATION THEORY 


the statistics of higher order Markov processes are 
discussed along with other topics in a previous report by 
the author.” 


Secrion II 


In the previous section, the details of discrete parameter 
simple Markov chain binary data representation were 
developed on the assumption that the probabilities 
considered were not time (and hence not range) dependent. 
Strictly, that application of discrete parameter Markov 
chains is applicable only to airplane flights wherein the 
test aircraft flys a circular pattern about the radar. Thus, 
it is desirable to develop a Markoy-type process which is 
applicable to nonradial (7.e., arbitrary) flight patterns. 
Such a process would lead to markedly simplified radar 
data simulation, as well as to a more accurate and real- 
istic representation of binary radar detection-trial data 
sequences. 

The desired extension is obtained by use of non- 
stationary, continuous parameter, Markov processes. 
Of course, “nonstationary”? means that the solution as a 
function of the time does nof depend simply upon the 
difference in time which elapses between two successive 
looks (or between the two looks under consideration 
regardless of the number of scans coming between them), 
but rather is a function both of the initial and of the final 
observation times, s and /, the “continuous parameters’’. 
General expositions on such processes are to be found, 
among others, in the texts by Ieller? and Doob.”° 


ANALYSIS 


In the mathematical treatment of continuous parameter 
Markov processes it is necessary to replace the fixed 
transition probabilities of the corresponding order discrete 
parameter Markov process with probability functions, 
that is by probabilities which are functions of the con- 
tinuous parameter /, (e.g., the time at which an aircraft 
under surveillance is at a particular range and azimuth). 
It is important to note in both the discrete and the 
continuous parameter cases we have been considering, 
that the aircraft is being tracked, 7.e., that it is under 
continuous surveillance. Therefore we are not attempting 
to locate an aircraft echo submerged in noise; rather, the 
radar is considered to ‘‘know”’ where the aircraft is at any 
particular time and decides (automatically or by observer) 
whether the result of a particular scan over the aircraft 
results in a detection [1] or nondetection [0], the data 
states. 

The transition probability function of a nonstationary 
process must be a function of the two parameters, s, 
corresponding to the time at which the radar data is first 


8G. C. Sponsler, “Markov Process Representation of Radar 
Detection-Trial Binary Data Sequences,’’ Lincoln Lab. Tech. 
Rep. No. 114; April 16, 1956. 

9W. Feller, ‘‘An Introduction to Probability Theory and Its 
Applications,’ John Wiley and Sons, Inc., New York, N.Y., ch. 
17; 1950. 

1 J. L. Doob, “Stochastic Processes,’ John Wiley and Sons, 
Inc., ch. 6; 1953. 


March 


observed, and ¢, corresponding to the later time at which 
time the data is observed for the second time. Thus the 
p;.; of the discrete parameter process are replaced by the 
function p;,;(s; 4). Given that initially, at time, s, the 
statistics were in state 7 (= 1 or 0), the transition prob- 
ability function p,,; (s; f) is the probability that upon the 
next “look” at the later time ¢, the radar data is in state 
j (= 1 or 0). It develops from the theory of continuous 
parameter Markov processes that these transition prob- 
ability functions are solutions of a set of differential 
equations first derived by Kolmogorov. These differential 
equations themselves are directly obtainable, as will be 
shown, from the Chapman-Kolmogorov equation: 


DiS; mw) = Dy Di s(8; Op;.0(t; H), 


ORSeS he <r (22) 


This equation is a mathematical statement of the fact 
that the state k at time « can be reached from the initial 7 
state 7 at time s through any intermediate state 7 at time 
t. Generally, it is impossible to solve for p,,; (s; £) directly 
from these equations; rather, a system of differential 
equations is derived from (22) from which the p;,; (s} #) 
may be determined provided certain boundary conditions 
are known. 

Without going into the rigorous development, let us 
make the following definition of the functions’’ c,(¢) and) 
c,;(t) (compare Feller’’). 


Define 
dp i(s; t ex(t);, i=j 
s=t eto ca, 1 J) 
Opts) | _ foals... t= 4%] (24) | 
t ; at 4 
; t=s ¢:;(S), Ue 7) 


Now if in (22) we take partial derivatives with respect | 
to s, thereafter setting s equal to ¢ and then finally replac- 
ing the pair (¢; w) by the pair (s; ¢) we find: 


Bald 0 = ¢;(s)pi_ (8; 2) — ze Cis(8)Di e(S; t). 


(25) 
Taking partial derivatives of (22) with respect to yu, and 
then setting u equal to ¢, we obtain (5): 


Beals = =C Ope cH Dae 2d cj(t)p;, ;(s; t). 


What appears to be single equations in both (25) and 
(26) are actually two systems of equations for all states 
7 and k, where 7 and k, (having values 0 and 1 in the 
first-order case) could be vector states in a higher-order 
Markov process. The system of (25) is called the back- 
wards system, because it involves differentiation with 
respect to the time of the first observation, s; the sys- 
tem (26) is called the forward system, because it involves 
differentiation with respect to the (later) time, ¢, of the 


11 Doob, op. cit., p. 254. 
2 Feller, op. cit., p. 388. 


Sponsler: Markov Process Representation of Radar Data 61 


bnd observation. In practical cases any solution of Despite the appearance of the partial differential signs, 
will automatically be the solution of (26) and vice these are actually ordinary differential equations. The 
sa. Hence practically only one set of these equations equation for p¢,o(s; 4) would thus be written explicitly as 
id be solved to determine the p;,; (s; 4). 

principle the Kolmogorov differential equations Apo ofS; = —C9(t)po,0(8; t) + ¢,()p0,1(8; f). (29) 
) or (26), permit the solution of the nonstationary, ot 


| tinuous parameter Markov PrOCees of any order, The equation corresponding to (28), which would be 
vided one knows a priori the functions ¢;(/) and ¢;,;(/). associated with (29), would involve the derivative of 
later functions may be interpreted as probability p, ,(s; 2). 

sities; that is to say ¢,(/), for example, when multiplied — Tn the preceding equations (27)-(29) we have tacitly 
a small increment of time, Af, would approximate the assumed that c,,(t) = c¢,(£). To prove this relation note 


bability that during the time interval ¢ to (¢ + At) a that, corresponding to (2) of the previous section, we have 
ge in state occurs out of state k. The actual probability the restraints: 


ald include an error term, a function of ¢, which would 
proach zero At approached zero. Thus in practical cases, Do.ol8, ) + Pols, 
en treating higher-order, nonstationary, continuous P1.0(8, t) + pi1(s, £) 
ameter Markov processes, it is necessary to estimate pe ae 
¢;,;(t) functions on the basis of such a probability If we differentiate both of these equations with respect 
erpretation. In most cases this functional dependence to s, and then set s = ¢, we obtain: 

far from obvious, and hence, although in principle a 


Se 


1 (30) 
ile (31) 


er-order nonstationary Markov process may be Ca i382) 

ved by use of (25) or (26), in practical cases it is quite Ci), Sve sh) qed. (33) 
cult because of lack of knowledge of the functional 

of the c,_,’s. Fortunately the first-order case turns We shall next derive an important relation between 


to be comparatively simple in our radar application. ¢o(¢) and ¢,(¢). Let us assume that 6 is known as a function 
le c, ,’s in this case may be ascertained by a judicious Of ¢, For example, 6 might be a function of range or 
essing process, and the validity of the results proven aspect or azimuth or all three; provided we know the 
insertion in (22) through (26). We shall therefore aircraft trajectory we can always relate 6 to the single 
monstrate the reasoning which leads to the recognition parameter, t. We recall that the p;,;(s; ¢) are the con- 
the solution for the nonstationary, continuous param- ditional probability functions which give the probability 
ir, first-order Markov process. that at time ¢ the system will be in state 7 after having 
In brief, what we shall do will be as follows: We shall been initially in state 7 at the earlier time s, no knowledge 
fin (32) and (33)] that in the first-order case ¢,,, = ¢;. being assumed as to the intermediate states. Correspond- 
(36) and (37) we shall relate ¢) and ¢, to 8, and then ing then to (3) of the previous section on Simple Markov 
hibit a possible pair, c) and ¢,, in (43) which 1) solve Theory, we have for the continuous parameter case the 
), and 2) reduce to a known form when £ is constant. analogous equations, 

ee the reader accepts the plausibility of these Ciy he B(s)p, (8: ) + [1 — BO] -moals: ) = BO) (34) 
ast accept (46) as an unique answer to the original 

pblem. (This paragraph courtesy of the reviewer.) B(s)py o(s; t) + [1 — B(3)]-po.0(s; 2) = 1 — BUA). (35) 
Let us look then, at the Kolmogorov differential equa- 

ins in the case of the nonstationary, continuous param- These two equations result from the definition of the 
tr, first-order Markov process. The subscripts 7, j, k, blip-scan function; 6(s) and @(¢) are defined to be the 
1 take only the two values 0 and 1, and either (25) or single-scan or single-look absolute probabilities of de- 


f represents a system of four linear, differentia! equa- tection at times s and ¢ respectively. They are thus 


I 


ns in the unknowns po.0(s; £), Po.(s; £, Pi.o(s; 4, Simply the values of the blip-scan function at times s 
As; t). In our case it develops that the set of four and /. 

ations (25) or (26), separate into two pairs of equations; | Taking partial derivatives of either of (34) or (35) with 
bh pair of which constitutes the differential equations respect to ¢, we obtain one and the same equation, namely: 
[ a particular pair of Pi ilS} b) aor example, the two Be LAO EAI Oe Ge (36) 
r of “forward’”’ equations may be written in the form 

ren by (27) and (28): where we have taken the limit as ¢ approaches s after first 
| having taken the derivatives. If we take partial derivatives 
dp: (83) _ ~¢,(t)p;.,(8; t) + ¢)(t)p,,;(s; O) (27) of (34) and (35) with respect to s and then take the limit 


dt as s approaches ¢, we obtain the same equation, namely 
ap, ,(s; (36), with the parameter s replaced by ¢. This result is 
teeare = —¢;(t)p;,;(s; t) + e(p;i(s; (28) to be expected as the functions c; are functions only of 


the instantaneous time. If we solve (36) for ¢), we obtain 
where 2 ¥ J and tap Oels a general relation between cy) and c, which must be 


62 IRE TRANSACTIONS ON INFORMATION THEORY 


satisfied, wherein we write the variable as 7, meaning 


that it may be any time: 


B’(7) in B(r)e,(7) 
N51 G9, j 


In order to solve Kolmogorov differential equations, 
(27) and (28), it is first necessary to determine the form 
of the functions cy and c,. As a guide in our solution, we 
know that for intervals of time separated by multiples 
of the radar scan period 7,, the transition probability 
function for constant 8 must reduce to the higher-order 
transition matrix [p;"] derived for the discrete first-order 
Markov process given by (9), namely 


| (1 — 8) es B —p 
MCLs. B)) 93 eh ese 0b 6) 


If an aircraft were flying a circumferential flight about 
the observing radar and if we started observing the 
aircraft at time s, then the exponent n in (38) would be 
approximated by 


(37) 


7p = SOF Ue 


Cat T) = 


[pi3] = I (38) 


(39) 


where 7, is the scan time of the radar. The equation is only 
approximate as the aircraft is moving; however, if 


UIT. 


2rR So 


(40) 
then (39) is quite a good approximation. (Here v, is the 
aircraft tangential velocity component and F# is the 
aircraft range.) 

If we replace in (88) by the expression in (39) we 
would then expect to obtain the stochastic matrix of the 
continuous parameter process in the stationary case of 
constant 6: 


Mmei= C ore 
GA) 
4 pine 6 fe eee ce 
(i426) 


Using (23) and (24), we may derive from (41) expressions 
for c, and ¢, the matrix of which would thus be given by 


—Inp|. 8 B 
™" la-~,) a-8 


where we denote Coo by ¢) and c¢,, by c,;. Since in this case 
G is considered time independent, that is a constant, the 
derivative of 8 with respect to time is 0, and hence (37) 
is satisfied by these values of c;,;. Hence, when 8 is a 
constant we surmise that 


[e:,@)] = 


colt) = P28 
(42) 
a) = 220 - 8) 


It would be convenient if (42), valid for constant 8, 
would also be applicable for the case where @ is a non 
constant function of time. However such cannot be th 
case for in that event (37), which involves the derivativ 
of 8, would not be satisfied; however, (37) is satisfied if 
single term in @’ is added algebraically to (42) to give 

In p 


colt) = Bt) — BY) 


I 


(43 


= U1 — pt). 


a) = —3' = 

Eqs. (48), (44), and (45) lead to an interesting obser. 
vation. We have remarked the c,(¢) function may be 
interpreted as probability densities from which fact we 
require c;(t) > 0. Thus, from (43), (44), and (45) w 
require 


me _ lo 


B(t) < Bt) < Pil — BO). 


(44) 


This inequality is a restriction upon the set of blip-sca ; 
functions admissible in our treatment of nonstationary, 
continuous parameter, first-order Markov processes. | 

As we require the c;(¢) functions to be real, we also 
observe from (42) and (43) that the scan-to-scan corre: 
lation coefficient of our continuous parameter process is 
limited to positive values, 7.e., 


OUS0 SSE 


(45). 

If this were not the case, then In p would have an 
imaginary part. (Note incidentally that —In p is positive 
for positive p.) Using (23), (43), (44), and (45) could be 
derived from the following transition probability function 
matrix, if (as we surmise) the general nonstationary, 
continuous parameter first-order Markov process solution’ 


were given by { 


| {1 — a(t] ia 
BCE) 


EL 6K0)|| 
oi Pa tae B(s) — B(s) | (46) 
[Te 66) aie 581s | 


A simple calculation will demonstrate that (43), (44), 
and (45) are indeed thus derivable. 

As may be proved by direct substitution, the solution 
of (46) satisfies both the Chapman-Kolmogorov equation) 
(22), and the differential form of (27) and (28) with thé 
c,(t) functions defined as in (23). It also satisfies the re- 
strictions given by (30), (34), and (35), where @(t) has 
been defined to be the first-order Markov process blip-scar 
probability function. The stochastic matrix of (46) is 
thus shown to be the required solution for the represen- 
tation of binary radar data by means of a nonstationary’ 
continuous parameter, first-order Markov process. 

In simultation applications where one desires to generate 
artifically sequences of binary data (a first-order process 
being assumed) one would employ the scan-to-scan form 
of (46) where + is the radar scan period; 7.e., 


[p:.;(s, t) | aa 


ee ie a DL, BO) 
L[1 — BC] BC) 4 
re Bt = 7) ee (47) 
Peet 7) | LB er) 


fis equation may be employed in practical cases to 
aulate sequences of binary data such as would be 
1erated by an aircraft flying any trajectory whatsoever, 
bvided that the blip-scan function is known as a func- 
n of time, that the statistics are indeed represented 
a first-order Markov process, and that the correlation 
constant. 

lHeuristically, (46) permits an easy interpretation of 
elf. The first term is the blip-scan probability (the 
ection probability) at the time, ¢, (that is, at the 
ine of the second observation on the radar data). This 
st term is modified by the correlated past history of the 
tistics according to the second term which is a function 
s, that is of the earlier time at which the first observa- 
mn was made. The matrix (46), as we have seen in (43), 
tomatically reduces in the case of circular flights to the 
rher-order transition probability matrix of the discrete 
st-order Markov chain, if one measures time intervals 
‘tween successive ‘looks’? in multiples of the radar 
hun time. 

[t is interesting to note that if the correlation coefficient 
vere a function either s or ¢, the form of (43), (44), and 
5) would not be changed. However, the derivitive of p 
ith respect to s or ¢ would appear in the differential 
uations (27) and (28), from which fact we note that p 
Ly not be a function of s or ¢ in a first-order Markov 
ae according to our formulation. Thus, our present 
ution is limited to the case of constant scan-to-scan 
relation. 

\Another apparent restriction in generality imposed by 
ir solution of (46) rises from the fact that the transition 
lobabilities therein defined must lie in the closed interval 
o 1. In the case of p,,i(s; #), we require 


ieeOitp it — ls) 250. 


d from po,,; we require 
| 1 > p() = p°-°’"8() = 0. 


he other two transition probabilities add no additional 
formation.) Together these inequalities require 


ts pee it reo B(s)] > B(t) > Dee a(S), 


Ors po <1- (48) 


the scan-to-scan formulation of (47), this inequality 
pcomes 


1 ol leet) a Bt or) Bd). (49) 


The inequality of (48) is only a reflection of the restric- 
n of (44), as is easily proved by differentiating (48), 
d then letting s equal ¢. 


(3 Sponsler: Markov Process Representation of Radar Data 63 


Thus we are again reminded that not all possible blip- 
scan functions are admissible in our nonstationary, 
continuous parameter, first-order Markov process repre- 
sentation, but only those which satisfy the inequalities 
(44) and, equivalently, (48). 

By way of an example, let us consider an application 
of the nonstationary, continuous parameter, first-order 
Markov process to the problem of automatic initiation. 
Let us assume that from a large number of aircraft test 
flights a blip-scan ratio, 6, has been determined as a 
function of range and altitude for that particular aircraft. 
This determination might be secured from circular flights, 
for example, it being assumed that the circular flight 6 
at a particular range is the same as the 6 for any aspect 
of the aircraft at that range. Let us further assume that 
this blip-scan ratio becomes vanishingly small at some 
given range (where, as defined later, n = 0). If an aircraft 
flies inbound, say along a radial course, what is the 
cumulative probability that it will be detected prior to 
range 2? Our detection criterion will be that an automatic 
initiation facility exists whereby, when hits are weighted 
+1 and misses —1 and a cumulative sum is maintained, 
the aircraft will be considered to be detected whenever 
this cumulative sum reaches say sum-state 4 (extension 
to other automatic initiation schemes is obvious). Fig. 1 
presents a ladder diagram of the possible sum-state 
transitions where the sum-states, k, are labeled 1, 2, 3, 
and 4, and the number of scans, n, are 1, 2, 3, 4, 5, and so 
on. 

We assume the radar scan-to-scan detection probability 
is represented by a nonstationary, first-order, continuous 
parameter Markov process. Let S(k: ) be the probability 
that the initiation sum is in state k, exactly n scans after 
n = 0. Let the cumulative detection probability, C(4: N), 
be the probability that the aircraft will have been detected 
prior to or upon the Nth scan; then we see by our initiation 
criterion that C(4: N) is given by 


C4: N) = s S(4: n). (50) 


In the course of the analysis it develops that S(k: n) 
enters only indirectly, rather what is required is the 
probability that the sum state k on the nth scan was the 
result of a transition in which either a hit or miss (that is 
scan-state 1 or 0) was the result of the nth scan. That is 
to say, we need the joint probability that sum-state k 
and scan-state 7 occur together on the nth scan. Let this 
joint probability be denoted by S;(k:n), wherein 7(=0 or 1) 
is the scan-state resulting from the nth scan. 

We observe that S,(0: n) = O since it is impossible to 
enter sum-state 0 on any scan which results in a detection. 
Furthermore, as the aircraft flight is initiated, 7.e., 
detected, whenever S(4: n) is obtained, (thereby halting 
the initiation process), we see that S,(3: n) = 0 since no 
transition from sum-state 4 to 3 is possible. Thus we have 
six possible joint probabilities: So(0: 7), So(1: 2), Sidi: n), 
So(2: n), Si(2: n), and S,(3: 2); with 


64 IRE TRANSACTIONS ON INFORMATION THEORY March 
r ’ i = rSye € k 
ae Te i $(4:4) S(4:5) etc. 
S(1:n) = S,(iin) + Sid: a) (51) | CAs 
= °3) 
S(2:n) = So(2:n) + Si(2:n) 3 SK SINGIN 
QS ZN 
S@ew) = Slew) D 5 (2:2) 20 A N N N 
% 2 XN = < a e N 
On Fig. 1, S(k: n) are the sum-state probabilities at o > DA A ye < & 
the point of intersection of the nth abscissa and the kth | Sig < é < € < < 
ordinate. The solid lines represent transitions resulting Q9> Lok WR ye DA YA ye 
from hits, and the dashed lines represent transitions Me : i. fe 
arising from misses. Consideration of Fig. 1 will show that: OAFo,0 tbo o.e 
NUMBER OF SCANS 
So(0: n) = So(1; 2 — 1)po,0n — 1; n) Fig. 3—Transition ladder diagram. 
+ 8,13; — Ipio(n — 1;n) 
+ 8,(0;n — 1)po.o(n — 13) BA ACI4:R), 9/50 (90) eee 
OF > LO R=. 
bk 
S137) = S(2;n — I)pooln — 13 n) = 
a 
+ §,(2;n — 1)pi.o(n — 1;n) 08+ am 08 
a 
e 2 fe) 
Gifaheenece (CH Ger Sse Ne Oy ae a. BLIP“SCAN 
Se Ci-2) (O;n — 1)po itm — 1;n) (52) E z a ae 
x «06+ CO06 
S25 n) = Sion — 1p, ove — 13 n) = is 
bs Ww 
oO 
y A Bey 3 ee Wy Red a ep) = 
Si(2;n) = So3n — Lpoitm — 1;7n) Hae we 
+ 8,03” — 1)piitn — 1;n) a = 
od 
<I 
S:837n) = Si2:n = Dpo ie — 137) 02+ 02 
S It IS NECESSARY, THAT 
+ 8; 23% — Dp in — lyn). 2 I-p [I-B(t)] = Bit+ 7.) 2 p Bit) 
OL O 


Here the p,;,;(rm — 1; 7) are the scan-to-scan transition 
probabilities of the first-order, continuous-parameter 
Markov process given by (47) with ¢ replaced by n and 
7 = 1. Eqs. (53) are recurrence or difference equations; 
the initial conditions upon them are: 


60; 2) el ra) (53) 
Hob lbh: YOR Ip Op EEL 


Utilizing the recurrence (52) with the initial conditions 
(53), and employing the transition probabilities of (47), 
one may build up the probabilities of attaining various 
sum-states upon the various scans as indicated by the 
ladder diagram of Fig. 3. Such a difference equation 
method is particularly amenable to automatic compu- 
tation as one need only store the results of the (n — 1)*' 
calculation in order to calculate the corresponding 
quantities on the nth scan. Thus using a computer to 
carry out the indicated calculation, the cumulative prob- 
ability, C(4: NV), may be obtained. This solution is given 
in Fig. 4, where the resulting cumulative detection prob- 
ability is given as a function of range from the initial 
scan, n = O, with the sample blip-scan curve shown 
(n = 0 corresponds to a range of 202 miles). The various 
n’s are those which would be obtained by an inbound 


120 140 160 
AIRCRAFT RANGE (miles) 


Fig. 4—Blip-sean function and cumulative detection probability. 7 


aircraft at a speed which happens to make n equal to the) 
aircraft range from the point n = 0. Cumulative detection 
probability curves for the automatic initiation criterion 
assumed are presented for the cases of scan-to-scan | 
correlations of 0, 0.4, and 0.75. The limiting case, 1.0, is | 
represented by the dashed line. As would be expected, 
the cumulative probability of detection improves as the 
(positive) scan-to-scan correlation grows larger. 

Extension and further details of this study are available 
in the above mentioned technical report.” 


ACKNOWLEDGMENT 


The author wishes to acknowledge the aid of Dr. V. A. 
Nedzel, who initially suggested the study; Dr. E. J. 
Akutowicz and R. Gold, who proofread parts of the 
work; A. Ahlin and R. Holland, who carried out numerous 
numerical computations not herein included; and M. 
Carini, D. Sherrerd, and P. Restuccia, who aided in the) 
preparation of the text. i 


13 Sponsler, op. cit. 


mmary—A method for automatically controlling the thres- 
bias in a detector is described and analyzed. 

Section I, the threshold bias problem is described: Bias is set 
#a constant false alarm rate (or a constant false alarm time). 
*‘standard biasing’? is meant the common practice of adjusting 
equired bias under the assumption that the noise is Gaussian 
has a flat power spectrum. 

Section II, the error that is made by standard biasing, if it 
fis out the Gaussian noise does not have a flat power spectrum, 
jiven. 

Section III, the automatic biasing method is given in the case 
re a constant false alarm time is required (its operation to 
tain a constant false alarm rate is analogous). The device 
kisioned operates as follows: One bias level 0) is used as refer- 
e; the number of crossings per false alarm time T with positive 
Ie of the noise envelope through d is averaged over a sufficiently 
, time to yield a stable value C. This count C serves to determine 
threshold bias v. The level v changes only if the ‘‘long time” 
Frage count changes; it is specifically assumed that there is no 
ponse to instantaneous changes in C. It is shown that such a 
hing method automatically adjusts to give a constant false 
im time (or rate), whenever the noise is Gaussian, and so has 
advantage over the standard biasing method. 

h Section IV, the efficiency of both methods for non-Gaussian 
se, is compared. 

h Section V, the probability of detection of a short (relative 
Ihe averaging time) ‘“‘sure”’ signal with Rayleigh distributed 
litude is given when automatic biasing is used; due to the 
plexity of the expression obtained, no direct comparison has 
1 made with the case where standard biasing is used. 


Srcrion I 


| \HE two ‘bias’ problems in threshold detector 
may be formulated as follows: 1) Let an arbitrary 
"~ band-pass and time 7’ be given. Determine the volt- 
t level which the envelope of noise crosses, with positive 
pe, on the average of once per time 7’. 2) Let an arbi- 
ry band-pass and voltage level 6 be given. Determine 
time 7 so that the envelope of noise crosses 3 with 
sitive slope on the average of once per time T’. 

Unless explicitly stated otherwise, the noise considered 
‘e is Gaussian with zero mean, but not necessarily 
lhite.”’ The term ‘‘crossover”’ will always mean “‘cross- 
pr with positive slope.” 

o solve either 1) or 2), one begins with Rice’s equation.’ 
le average number C of times per second that the noise 
pelope crosses a level v (with positive slope!) is 


a(w) -v:exp (-2) 


C= (1) 


* Manuscript received by the PGIT, October 17, 1956. The work 

cribed in this paper was done in connection with a Boeing Air- 
t Co Subcontract. 

+ Dept. of Math., Univ. of Southern Calif., and consultant to 
io Corp. of America, Los Angeles, Calif. 

t Manager, Systems Engineering, RCA, Los Angeles, Calif. 

1S. O. Rice, “Properties of a sine wave plus random noise,”’ 

il Sys. Tech. We vol. 27, pp. 109-157; 1948. See (4.7) 


y IRE TRANSACTIONS ON INFORMATION THEORY 65 


Automatic Bias Control for a Threshold Detector’ 


J. DUGUNDJIt AND E. ACKERLINDt{ 


where 
= w(f) = power spectrum of noise 
f, = midband frequency 
= (2m) Ch Sai) OK ad te 


0 


= 0,1,2; (6) = mean noise power) 


1 EE = By 
i bo Qi by : 


Since TC is the average number of crossovers through v 
in time 7’, the voltage level and time in either of the prob- 
lems 1) and 2) must satify the relation TC = 1, that is 


v 

= T-a(w)-v- exp (-#}. (2) 
It is clear that this can be solved for 7 so long as V-a(w) 
x 0; however (2) cannot be solved for v unless 7’ satisfies 
certain conditions. 

A) In Appendix I it is shown that problem 1) has a 
solution v(T, w) if and only if T-a(w) > ~Ve/b,, where e 
is the base of natural logarithms. Problem 2) always has 


the solution 
alae [=| 
x °*P | 26, 


Whenever the condition T-a(w) > Ve/b, is satisfied, 
(2) actually has two solutions 0 < v, < Wb, < v» (see 
Appendix I); by elementary physical reasoning, the 
largest of these two solutions is the bias level desired in 
problem 1). 

A flat power spectrum in the given band-pass is denoted 
by F, and one has 


ais sty Se 
PO.) = GD) 
valid if d-a(w) # 0. 


oF) = 8 sia (3) 
where 
8 = width of band-pass 
F(f) = 6,8", f in the band-pass 


= 0 otherwise. 


B) It follows from (8) and from the preceding section 
that, for white noise in the band-pass, problem 1) has a 


solution if and only if 
I ae EFT" 
De NCn a a8 


Te 


66 [RE TRANSACTIONS ON 


Problem 2) always has the solution 


Lf, 
B 1 


To, F) = = 


Section II 


In practice, especially when the power spectrum of 
the noise is subject to change during the course of opera- 
tion of the detector, problems 1) and 2) are not solved 
by a continual recomputation of (1). In this part, the 
standard engineering practice, and its accuracy, will 
be considered. The problems 1) and 2) will be discussed 
separately. 

Problem 1): The standard biasing method consists in 
measuring bo, assuming that the noise has the flat power 
spectrum /(f) b,8 °, and adjusting the threshold 
bias, once and for all, at the value v(7, F). The error that 
can be committed by following this practice or, in other 
words, the range of variation in the average number of 
crossovers through v(7, F), over all possible power 
spectra having the same mean power b,, will now be 
determined. This requires two preliminary results. 

A) Let 7 and w, be such that a(w,)T > a/e/by SO 
that there exists a solution 2, v(T, w,) for problem 1). 
Then for noise having power spectrum w and the same 
mean power as w,, the average number of crossovers 
through v(7, w,) in time 7 is 


aw) 
a(w,) 


provided a(w,) # 0. 

This is seen by noting that Ta(w,)v, exp (— v?/2b)) = 1 
and that, according to (1), the average number of cross- 
overs of the new noise through v,; in time T is 
vy ) 
2bo/ 

B) It isshown in Appendix II that 0 < a(w)/a(F) < V3 
for all power spectra in the given band-pass having the 
same mean power as the flat /’. Values so close to 0 as 
desired are obtained by concentrating more power in 
any one frequency; values so close to ~/3 as desired are 
obtained by concentrating more power equally on both 
ends of the pass band. 

With these preliminaries, one obtains immediately the 
possible error that can be made by following the engineer- 
ing practice: fo 

Gielen ave V/ 6e/r and assume standard biasing 
yields the level v,. Then, regardless of how the noise is 
shaped, provided only that the mean power remains 
constant, the average number of crossings through 1 
in time 7 will be between 0 and 1/3. Values as close to 
zero as desired will be obtained by concentrating a higher 
proportion of the power in a narrow portion of the band; 
values as close to 1/3 as desired are obtained by con- 


a(w) 
a(w,) 


To(w)v; exp ( 


INFORMATION THEORY 


centrating a higher proportion of the power equally 
the ends of the band. 
One merely applies A) and B) in this section, noti r 
first that a(F)T > Ve/bo. 
Problem 2): The standard engineering method fo 
solving 2) consists in measuring by and taking 


1 6051 D 
T6,F) = 31 eo exp EI 


that is, in assuming that the noise has the flat powel 
spectrum /’ = 6,@~* in the band. To compute again the 
error that can be made by following this practice, note 
that from the relation 


a2 
7a a 


= T(6, w)-a(w)d exp c 


T(@, F)a(F)o exp [= 


one gets 
a(F’) 
a(w) 


whenever a(w) # 0. This, with II, B), 
to the following. | 

D) In problem 2), the range of variation in 7 necessary 
to maintain one crossover per time 7’ at level 0, over all 
changes in power spectrum having the same mean powel 
and a(w) # 0, is ; 


TO, w) = T(6, F) 


leads at once 


ATO, Fi) eng 


Values as close as desired to the smallest value are obtained 
by concentrating a higher proportion of the power equally) 


obtained by concentrating more power in a narrowel 
part of the band-pass. 
Observe that 


1 1 
mee HORT) Ps = 
Va Ne 
From Appendix I, it follows readily that, regardless 0 
the level 5 used, one always has 


Sane < 


Ei aN 


i 32. 
9 —= T — 
zs /3 ©, B 


Section III 


In this part, an engineering practice (which will require 
somewhat more circuitry than that used in the standar¢| 
method) for automatically solving problems 1) and 2) 
is described and discussed. 

Select a voltage level 1, which is to remain fixed one¢ 
for all. Continually count the number of times per secon¢ 
that the noise envelope crosses vp with positive slope, ané 


| 


| 


Vo be a relatively long running average of these counts. 
icurrently also measure the mean noise power bo. 
\the Co and by so obtained to calculate a from 


2 | 
2bo | 
: Whenever Ta > Ve/bo, the required 


bias level v is obtained by finding the 
largest root of 


Co = a-Vo: exp f 


~ vy 
Ta-v:exp |-2 | ile 
| Problem 2): Whenever Cy # 0, the required time T' is 
obtained fon 
tot exp| —5-| = 1 
ad exp Fie 


b values of v or T so found change as Cy and/or by 
nce a) change. 

1s an alternative procedure, one can select two distinct 
wage levels vo, v;, and obtain the numbers Cy, C, at 
fe respective levels as before. The two simultaneous 
ations 


2 
Cy = ale exp | -2] 

2 
C, = av, exp | -2] 


re a unique solution (a, bo) whenever Cy # C,; in this 
B one uses the (a, bo) so obtained to proceed as before. 
The efficiency of this method for solving problem 1) 
hown here. 

h) Automatic biasing will adjust the bias level in 
lem 1) to maintain an average of one crossover 
time 7, regardless of what the power spectrum and 
an power of the (Gaussian) noise is, provided only 
t Ta(w) > Ve/bo; that is, (see Section I, A) it will 
just to the correct bias level whenever such a bias 
] exists. Indeed, the count Cy obtained satisfies 


Co = a(w)vo exp (— 05/2bo) 


ere w is the actual power spectrum of the noise; the 
bulated a is therefore exactly a(w), and the result 
jows. 

3) In exactly similar fashion, automatic biasing will 
just T in problem 2) to maintain an average of one 
jssover at 0 per time 7’, regardless of the power spectrum 
ll the mean noise power, provided only that a(w) # 0; 
Ht is, it will adjust to the correct time interval iencrer 
th a time exists. 

"hese results apply, of course, also to the alternative 
peedure described above. 


Dugundji and Ackerlind: Automatic Bias Control for a Threshold Detector 


67 


Section IV 


The efficiency of the standard and automatic methods, 
in case the noise in the band-pass is non-Gaussian, is 
considered. Only problem 1) is discussed; a discussion 
of 2) can be done along identical lines. 

Recall (Rice’) that the average number C’ of times per 
second that the envelope of noise crosses the level v with 
positive slope is 


a / piv, R')R! dR’ 
0 


where p(R, R’) is the joint probability density of R and 
the slope R’ at the same instant of time. 

A) Let T > 1/8 /6/m, and apply standard biasing 
to the non-Gaussian noise with probability density 
p(R, R’). The average number of crossovers C through 
v = o(T, F) in time T is 


ye iel ad 


B > ox r(Z)-[ Ripe, Rh) dee 


For, by the hypothesis, a v = v(T, F) exists satisfying 


= vy ) 
a(F)Ty exp (-£ aie: (4) 
the expected number of crossovers by R through v in 
time T is 
Cai / Rn, RY) dhe (5) 
0 
eliminating T from (4) and (5), and using (3), gives the 
result. 


B) Let automatic biasing be used, and assume that the 
experimental count Cy at the fixed level v) satisfies 


V9 i.) je 
2bo bo 
where b, is the experimentally measured mean noise 
power. Then the average number of crossovers C by the 


non-Gaussian F through the automatic bias level v in 
time T is 


TC, = Us exp (— 


Vo Exp (-#) i DU, fo ie aie, 


v exp (-2) il Do, le Eee Oe. 


In fact, the experimental count at v, will be 


Cs= 


= | pve, ROR! aR’; (6 
0 


the a is determined from Cy = avy) exp (— 05/2b.), and 
because of the condition on 7'Co, the automatic bias level 


v exists: it satisfies 
v ( y ) 
-y exp |—=-—] = 
S/F pte S205 


(7) 


68 


The expected number of crossovers by FR through v in 
time 7’ is 
C=T | R’pi., 


v0 


R’) dR’; (8) 


2) 


eliminating 7 from (8) and (7) and using (6) gives the 
result. 

It should be observed ‘that, unless TC, satisfies 
the condition stated, the automatic bias control 
does not yield any bias level; it may be possible that 
T > 1/8 V6e/x, (so that standard biasing gives a level) 
even though TC, < v) exp (—v6/2bo) Ve/bo (so that 
automatic biasing gives no level); further discussion re- 
quires elaboration of p(R, R’). Note, however, that if 
one is using automatic biasing to solve problem 2), then 
the method yields a result whenever Cy # 0. 

Note, further, that if R, R’ are independent, then 
whenever automatic bias control applies, the crossover 
rate is andependent of the slope distribution, a state of 
affairs not true in standard biasing. 


SECTION V 


In this part, automatic and standard biasing will 
be compared according to their efficiency in detecting a 
sure signal in noise. The biasing will be according to 
problem 1). 


Assume: 


1) Automatic biasing is used. 

2) A signal (sine wave superposed on the noise current) 
will appear during 0 < ¢ < ul where 0 <p < i 
its frequency is f,. 

3) The probability of the signal’s appearance 1s uniform- 
ly distributed in (0, u7’) independently of its ampli- 
tude. When it appears, it has duration A with zero 
buildup and decay times. Finally, its amplitude s 
has the Rayleigh distribution 


os [ £| 

Fre laieonl 
The probability of detection (via a crossover at the bias 
level) is desired. 

A) Assuming that the power spectrum w(f) of the noise 
is symmetric about f,., it is shown in Appendix III that 
the probability that the envelope passes through the 
automatic bias level v with positive slope during the time 
interval (¢, ¢ + dt) is Ldt where 

a 


1 A 1A bo ov 
b= 5 (1-4) + E 
‘aad (s) = ex |-#| ds 
ul’ J mt o ale 20 : 


TT Cea a ee 
To define y(s), let p(R < x) be the probability that the 
envelope of noise alone is ‘ a voltage level < x. Then 


IRE TRANSACTIONS ON INFORMATION THEORY 


I Letina | 
Loy -s<r<)-tf 2ew| 2b, 


2 pe ox 
2sR 


ge) 
sare sin 


vee ee 


“are sin 


ip(R < 2) 


B) As a consequence, with automatic biasing, the prob) 
ability that a signal of strength s is acting at the time of a 
crossover at the bias level v is 


1s SaieAee 3° Sv 1 
Loi Ear: Wi ee (-£)n(2) 1 wT va} ; 


exp 


The probability that no signal is acting at the time ol 
crossover is 


In this case that the standard biasing method is used, 
letting v,; be the bias level, one has, instead of ZL, the 


expression 
_ aw) (1 - 4) Aeon. 
‘i al ee Tigers 


ov*:; 


1 ‘ ieee ps2 s s 
ey 3 bo (bo i} pa I ee (-#) 


where y,(s) is the function described above, with v re 
placed by v, throughout. 


ds i 


APPENDIX | 


First consider the function 


. v 
alv exp (-=). 
It is evident that g(v) = —¢(—v), that o(v) > 0 for 
v > 0, that (0) = 0, and that ¢(v) ~ 0 asv > o. The 
critical points occur whenever dg/dv = 0, that is, at 
++/b,. From the above, v = ~/b, is the unique maximum 
of ¢; for v > 0, ¢(v) steadily rises till v = +~/b, and then 
steadily decreases. 

A) The equation a7v exp [—v’/2b)] = 1 has: 

1) Two solutions v,, v. with 0 < v, < vV/ by < V5 il 

aT > ~Ve/b) where e is the base of natural) 
logarithms. 
2) One solution v = VWbo if aT = Ve/do. 

3) No solutions if aT < Ve/bpo. 


gv) = 


ite first that the graph of g(v) — 1 has the same form as 
‘t of y(v) but is depressed by one unit along the ¢ axis. 
solutions of g(v) = 1 are the crossings of g(v) — 1 with 
v axis. Now, if max g(v) < 1, the depressed curve will 
entirely below the v axis; if max g(v) = 1, the depressed 
ve touches the v axis at v = = +/b, alone; if max g(v) > 1, 
will cross at a single point 0 < v, < Win and by Hie 
navior of g as v > ~, also at a single point v. > Wy. 
ce 


max ov) = (Vb) = aT’ V by exp (—}) 


} result follows. 


APPENDIX II 


ie proof is accomplished in three stages. Note that 


V3 

Vi SD i oe 
TB 
ice B, bo are constant, the quantity of interest is B(w) = 

=e 

2 
all “symmetric” power spectrum one satisfying 
if. — f) = w(f. + f) where f, is the midband. In the 
owing, the graph of w(f) is translated so that f, 
curs at the origin. 
‘A) For any given w, there exists a symmetric w, having 
» same mean power, with 


Bw.) 2 Bw). 


ow) _ 
a(F’) 


Gs ae w(f) tata a w(f) mer? 


w.(f) + A(f). 


\is symmetric, and h is an odd function. Since w, = 0 
ican be taken as a power spectrum. The mean power of 
is the same as that of w. In fact, because h(f) is odd, 


B/2 B/2 
j= [wnat = fw + alas 
J —B/2 —B/2 


B/2 
| =f w(f) af = bo; 
| —B/2 

ause w, is even, bf = 0; finally 
| i B/2 
|= 4x” f’w,(f) df 
| —B/2 
8/2 
| ada? fF lwdf) + MDL AF = ba. 
| —B/2 


Nie 
| 


BCD )a= 050s ="bpby, = bebe, — b; = Bw). 


IB) Let w be symmetric. Given any ¢« > 0 there exists 
symmetric w, having the same mean power, such that 
| vanishes everywhere in —6/2 + « < f < B/2 — «6 
id B(w,) > Bw). 


17 Dugundji and Ackerlind: Automatic Bias Control for a Threshold Detector 69 


Let 


I 


The possibility fo = 0 is not excluded. Define y 
2(fo + 8/2), set 


0 0 


ESS) 
u(f) = B 
wf) + w+ fo-) S155 

and extend u(f) symmetrically over — 6/2 < f < 0 


The mean power of u(f) is still bp since 


Bef upag = [wh af 


B/2 B/2 n 
+f wrtn-va=(f +f )wonar = & 
By symmetry, bj = 0. Finally, 
8/2 1 
we = de? [wih af + 4x? [wilt +0 — fol af. 


Since for fo < f < 7 one has [f + 7 — fil’ 
since w(f) = 0, this shows 


> f* and 


B/2 
, 2 b, 
1b > da? [| frw(f) df = 


Thus B(u) > B(w) and u(f) is certainly zero in —B/4 < 
f < B/4. Starting anew with u(f) repeat the construction 
to obtain a u,(f) zero everywhere in —(@/2? + 6/2°) < 
f < (6/2? + B/2°) having mean power bo, and satisfying 
B(uz) > B(u,) > B(w). By repeating the construction N 
times, where 


re 


N 
ROS 
the result follows. 


Proof of Section IT, B) 
By A) and B) in this Appendix, sup B(w) = 


wealth 


6 being the Dirac delta. A simple computation gives 
B(wo) = 7°8b; so that a(w.)/a(F) = »/3. By elementary 
continuity considerations, there are power spectra w 
with a(w) so close to a(w) as desired, hence sup a(w)/a(F) 
= 3. 

On the other hand, one has 0 < B(w) and in fact, for 
w, = bo 6 (f — ¢), g any frequency in the band, actually 
B(w,) = 0. Since again there are power spectra w with 
a(w) so close to a(w,) as desired, this shows inf a(w)/a(F’) 
= (). 


B(wo) 


IRE TRANSACTIONS ON 


AppEenprIx III 


Let: r(t) = envelope of noise alone 
R(t) = envelope of noise plus signal s. 

Fix a voltage level »; and assume known 1) the prob- 
ability c(v,, s)dt that R crosses v, with positive slope in 
the time (¢, ¢-++ dt) and 2) the probability y(s) that there 
will be a crossover at the instant the signal s first appears. 
If an observation of the envelope be made, the probability 
that there will be an crossover during ¢, ¢ + dé will be 
determined. 

Recall first that, if pe(H) is the probability of A, 
knowing G, the standard formula for conditional prob- 
abilities is 


p(G) pel) = v(G, H) = pH) -p.(G) (9) 


where p(G, H) is the joint probability of G and H. Now let 
G crossover during (¢, t + dt), 


H, = signal s present during entire time (t, ¢ + dt), 
H, = signal s starts sometime during (¢, ¢ + df), 
H; = no signal (7.e., s = 0) during (é, ¢ + dé), 

H, = signal ends in (¢, ¢ + dt). 


Using (9) gives 
Pol) = kpi)pu,(G@) = kp(Ai)e@, s) dt 
Pa(H2) = kp(H2){y(s) + A di} 
Po(Hs) = kp(Hs)c(v,, 0) at 
Po(Hs) = kp) E dt 


where k = 1/p(@) and A, FE, are quantities readily 
calculated. 

These four equations together determine one distri- 
bution. The k is therefore a normalizing factor; the sum 
of all the probabilities must equal 1, so that p(@) can be 
determined from the condition 


pol) + pe(He) + pels) + pe(Hs) = 1. 
If s itself is a random variable, the normalizing equation is 
J volt yp(s) as + | volt.) as 


+ po(Hs) + pe(H,) = 1. 


Turning to the problem in Section V one has 


(10) 


» (signal appears, at t) = 


pl 
and 
A dt 
p(H,) = iT 
dt 
p(H,) = aT 
A+ di 
p(H;) = 1 — + oe 
bw 
dt 
p(H,) a nT 


INFORMATION THEORY Mare 


Letting p(s) denote the probability that the signal has 
amplitude s, one gets from (10), upon neglecting terms 
involving (dt)’, 


of. i. a c(v,, 8)p(s) ds. (1 


a 


In the automatic biasing method, v, always satisfies 
aTv, exp lL 


Since from Rice’, using the assumption that w(f) 1 


symmetric around f,, 
v; + s SV, 
= Ly 
2bo bo 


c(v,, 8) = av, exp | 


one finds 
cv, s) 
CM 0) = 


To calculate y(s) recall that 


pe ATT ae 


where /,, J,, are independent normal variates with zerd 
means and variances by). Thus, i 


R<V0.+9' +0 
< R + dR) = p(r, R) dr dl 


fae We IIE << pg, 


the region Q being obvious. This yields: 


oe 
2s xP \ 2b. / a 


lr—s| SR orm 


rR 


pea) ee 


0 otherwise. 


from Be R) and yields the eee given in the oa 
With evident calculations, one obtains the results il, 
Section V, A) and B), from these expressions. | 


ACKNOWLEDGMENT 


The authors wish to thank T. L. Gottier, Manager o| 
West Coast Electronic Products Department, Radio Corp 
oration of America, for his advice and encouragement. | 


2 Tbid., (4.8). 


ummary—This paper presents the exact integral equation 
ion and synthesis for a large class of optimum time variable 
filters characterizing many physical problems. The signal 
om process is expressed in nonstationary Fourier series 
mble form, with certain statistical information assumed 
t its coefficients. The noise perturbation is represented by 
mped exponential-cosine autocorrelation function, which is of 
or importance in fields of physics and engineering, such as 
ir, meteorology, and automatic control. For any finite operating 
od from 0 to ¢, the optimum time variable weighting function 
_t) is found to be of a separable form, consisting of functions 
Harameter 7 multiplied by functions of parameter ft, plus two 
b function contributions at the beginning and end. Valid 
hesis designs are developed for such separable weighting 
Htions. Asymptotic synthesis techniques are formulated which 
tr special situations of long-time or short-time operation. The 
Its are applied to two examples of practical interest. 


\ 
i 
| 


i 


INTRODUCTION 


| NHE FIRST PORTION of this paper presents the 
solution of a general integral equation, occurring 
| in prediction and filter theory, which gives the 
|thematical form of the optimum time variable linear 
tem for a large class of practical problems. The input 
ssages {’¢(¢)} are assumed to be additive mixtures of 


hals {’s(¢)} and noise {’n(¢)}, where 7, which may or 


y not be countable, denotes the different ensemble 
mbers of each random process. These messages are 
perties are not invariant with respect to a shift in time. 
is corresponds physically to, say, having a starting 
itch in a system. 

“he input signals and the desired outputs {’d(¢)} are 
i process is specified only through its autocor- 
htion function which is supposed to be of damped 
Har fading records’ and in wind gust velocities’ to men- 
h but two of many observed cases. 

tying linear operation on the signal component. ‘s(é) 
pe future value, or its integrated value, etc. The signal 
idom process is assumed to be a nonstationary random 


en to be zero for negative times so that their statistical 
| 

mitted to be quite arbitrary in nature. The noise 
geal cosine form. This type of noise occurs in 
(he desired output, ‘d(t), is any prescribed time 
? e; for example, ‘d(t) might be the signal ’s(f) itself, 
pcess which is expressible in finite Fourier series ensemble 


* Manuscript received by the PGIT, August 8, 1956. 

it The Ramo-Wooldridge Corp., Los Angeles 45, Calif. 

-H. M. James, N. B. Nichols, and R. S. Phillips, ‘“Theory of 
o-mechanisms,” Rad. Lab. Ser., the McGraw-Hill Book Co., 
|., New York, N.Y., vol. 25, ch. 6; 1947. 

>G. C. Clementson, “An Investigation of the Power Spectral 
hsity of Atmospheric Turbulence,” Rep. No. 6445-T-31, (Se. D. 
»sis), Instrumentation Lab., Mass. Inst. Tech., Cambridge, 
ss.; May, 1950 


IRE TRANSACTIONS ON INFORMATION THEORY eal 


Exact Integral Equation Solutions and Synthesis for a 
arge Class of Optimum Time Variable Linear Filters’ 


JULIUS 8S. BENDATT 


form, with certain statistical information known about 
its coefficients. It is required to approximate ‘d(t) as 
closely as possible, for any finite time of operation from 
0 to ¢, by means of a time varying linear filter acting on 
the full input message ‘i(¢) = ‘s(¢) + ’n(é). The criterion 
used in defining the optimum system is to minimize the 
mean square ensemble difference between the actual 
output response of the system and the desired output. 

The second portion of the paper considers the synthesis 
question. It is noted that the derived optimum time 
variable linear filter weighting functions h(7, ¢) are of a 
separable form, consisting of functions of parameter ¢ by 
functions of parameter 7, plus two delta function contri- 
butions at the beginning and end. A schematic design is 
developed for synthesizing such separable time variable 
weighting functions which is valid for all times of opera- 
tion. For many problems, however, the explicit form of 
the optimum time variable weighting function may be 
so complicated that an exact mechanization for all times 
of operation is an extremely involved matter. On the 
other hand, the physical situation may indicate that the 
apparatus is to be used for a very short, or for a very long, 
period of time where in either of these asymptotic operat- 
ing regions, the optimum weighting function assumes a 
simpler asymptotic form. Synthesizing an asymptotic 
weighting function instead of the optimum weighting 
function is shown to give an appropriate mechanization 
for the corresponding asymptotic region of operation. 

In order to make the paper self-contained, Section I 
presents in condensed form the main ideas in the under- 
lying prediction and filter theory. This theory originated 
from work by Wiener,’ Zadeh-Ragazzini,* Booton,’ and 
others. Section II takes up the general problem of interest 
in this paper and carries out the complete solution. 
The integral equation in question is shown in (16); the 
form of the answer appears in (21). Two applications are 
considered in Section III. A system design method for 
synthesizing separable time variable weighting functions 
of the mathematical form found in (21) is discussed in 
Section IV. Section V develops valid mechanizations for 
long-time operation for the two examples of Section ILI. 


3N. Wiener, ‘The Interpolation, Extrapolation and Smoothing 
of Stationary Time Series,’ NDRC Rep., Cambridge, Mass., 
1942; John Wiley and Sons, New York, N.Y.; 1949. 

4L. Zadeh and J. Ragazzini, ‘‘An extension of Wiener’s theory 
of prediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 1950. 

5R. C. Booton, ‘An optimization theory for time-varying 
linear systems with nonstationary statistical inputs,’ M.I.T. 
Meteor Rep. No. 72; July, 1951, and Proc. IRE, vol. 40, pp. 
977-981; August, 1952. 


AD) 


72 IRE TRANSACTIONS ON 


The Conclusion summarizes the principal results, while 
the Appendix contains details of mathematical proofs. 


Section I. Basic THEORY 


Consider a possibly nonstationary collection of input 
{’7(t)} composed of a mixture, not necessarily 
additive, of input signals {’s(¢)} and perturbing noises 
{’n(t)}, 7 denotes the different ensemble members. Suppose 
that this message passes for a finite operating period T 
through a particular time variable linear system character- 
ized by its weighting function h(z, t), where h(z, f) denotes 
the response of the time varying system at time ¢ to a 
unit impulse applied at time ¢ — +. For physical realiz- 
ability, it is necessary that f — 7 < torr > 0. The actual 
output response of the system is given by 


messages 


fp 
ir = fo Wor, 0b = 9 ar. (1) 
0 
The above formula for ‘r(¢) represents infinite operating 
time procedures by letting 7’ approach infinity. Independ- 
ent of operating time, it applies to constant parameter 
linear systems by replacing h(7, t) by h(r), where h(r) 
measures the response of a constant parameter system 
to a unit impulse after a time 7 has elapsed. For the 
important practical case of a starting switch in the 
circuit, or where the input message is zero for negative 
time, 7’ becomes a variable f. 

The desired output, ’d(¢), is permitted to be any time 
varying linear operation on the signal component of the 
message, ’s(¢) itself. The difference between the actual 
output response ’r(f) and the desired output ‘d(¢) is the 
system error ’e(é), 

Te 
e) = fe, 0 Milt = 2) de = fat. (2) 
0 
It is required to determine the weighting function h(r, £) 
such that the mean square ensemble average of ‘e(é), 
namely (’e7(t))4, over j, IS & Minimum. Squaring and 
averaging ’e(¢) over 7 gives, 


Owe if 1 [ Neoresn 


Cit — y) a(t — 7))s, dr dv (3) 


= Y ie h(v, t) Cit — y) WIG Re dy 


+ (d(t) "dd av. 


By definition, for nonstationary processes, the auto- 
correlation function of the input message with itself, or 
the desired output with itself, is 


yiilh, ty) = (t(t) OU re over 7 
t,) = Cd(t) dC ae over 7 


the angular bracket notation indicating ensemble average, 
the remaining notation showing dependence upon the 
times of observation. The cross-correlation function of the 
input message and the desired output is 


Yaalts , 


INFORMATION THEORY 
yeahs, 5) = eae Ca( ja de) over j* (( 


In terms of the correlation functions, the mean squan 
ensemble system error shown in (3) becomes 


PO = ff Wo, OMe, Dylt = 9, 6 = 9) de dy 


she 
= 2. hy bys eee 
0 


This expression gives the mean square ensemble systet 
error for a particular time-variable linear system a 
specified by its weighting function h(r, ¢). For fixed 
known nonstationary statistical correlation parameters 
villi, te), Yaa(li, tf), and y;a(t,, ta), it is required to ming 
mize (’e"(t))4, as a function of h(z, f) in order to determin 
that time-variable linear system giving the least possibh 
system error. (The existence and uniqueness of solutiot 
is tacitly assumed in above statement and others to follow 
This is usually guaranteed from the physical nature of th 
problems under investigation.) The special h(r, ¢) that 
gives the minimum value is called the optimum weighting 
function and characterizes the optimum system. If 4 
particular h(r, t) is optimum, then replacing h(7, t) by 4 
different operator h(r, t) + ng(z, t), with arbitrary real 
cannot result in a smaller value. This variational techniqut 
proves the following. 


Theorem 1 

A necessary and sufficient condition that h(r, ¢) be the 
optimum weighting function is that it satisfy the integral 
equation 


Ai 
yult = 9,0 = f Rr, yt =», t= 2) dr, 
0 


OFS er (8) 
Proof: See Appendix. 


Theorem 2 


The minimum mean square ensemble system errol 
resulting from the optimum choice of h(r, f) is } 


in : 
(e(t))av = Yaalt, t) — / hir, tyialt — 7, t) dr. 0 
0 
Proof: See Appendix. | 


Two major problems occur in practical applications 
this basic theory. The first is to solve the a 
integral equation (Theorem 1) using the particulai 
correlation functions involved in the physical situation 
The answer depends only on these statistical parameter; 
and the time of operation, and consequently applies t¢ 
all other applications having similar information. Secondly 
an engineering mechanization must be developed t¢ 
reflect the mathematical result. A suggested design 
technique which will be valid for any desired operating 
region is to synthesize towards the required | 
ensemble system error (Theorem 2). 


~ 


’ more extensive discussion of physical and math- 
ntical ideas underlying this theory is contained in a 
lvious paper.” 


Srcrion II. Inrecrat Equation SouuTIon 
FOR GENERAL PROBLEM 


Al 


he basic theory will now be applied to a general 
‘blem. For the input signals {’s(¢)}, any nonstationary 
idom process expressible in the following finite Fourier 


N N 
> ‘a, cosnwt + >> ‘db, sin not, 0 
n=1 n=1 (10) 


= 0, t< 0 


ere w = known constant, N = known integer, and 


Ca. ae over j = Anm 
Cb, dite ye over 7 = [Bone (1 1) 
Can “Ou at over 7 = Ynm 


| assumed to be known or calculable statistical quantities. 
e usual Fourier series constant term may be included, 
desired, by summing the series from 0 to N. Also, 
‘re 1s no restriction as to the relative number of sine or 
kine terms since the coefficients may be zero. The 
nals being zero for negative time many correspond 
ysically to the presence of a starting switch in the 
stem. 

The desired outputs {‘d(¢)} are obtained from the in- 
ming signals by means of some preassigned time 
tying linear operation, so that for a suitable known 
lighting function g(r, ¢), and for a time of operation 
im 0 to Z, 


id(t) 0 


: [ a, 0 ist — 9 ar, 
; (12) 
; =0, 


C0) 


Ihe noise random process {‘n(t)} is assumed to have 
b following autocorrelation function: 


q 
| 


(t,, &) = A exp (—k | t, — #, |) cosc(t, — #), 
] ite nO) 
= Q, fi <O Or & & U; 


(13) 


|k and c being non-negative known constants. 
(The various cross-correlations between noise and signal, 
noise and desired output, are assumed to be identically 


“(2) = ‘s(2) + ‘n(i). (14) 


6 J. S. Bendat, “A general theory of linear prediction and filter- 
,’ The Ramo-Wooldridge Corp., Control Systems Div.; Decem- 
}, 1955. Also J. Soc. Indust. Appl. Math., vol. 4, pp. 131-151; 
ptember, 1956. 


7 Bendat: Synthesis of Optimum Time Variable Linear Filters 


les 
Hence 
Vislhi, be) = Yaoltr, te) + Yanlt, t) 

WHEN ays (ti ty) — 10 — ye 


= Veal ty ) be) 


(15) 


mae Ue i) when NAW s ty) = 0. 


The general problem is now reduced to solving the 
integral (8) of Theorem 1, (with T replaced by ¢), for the 
optimum weighting function h(7, ¢). Upon substituting 
the various correlation functions given above, the integral 
equation takes the form, 


Sr P(n, t) cos no(t — v) + > Q(n, t) sin na(t — v) 
= [ h(r, | Fin, § — 7) cos nwlt — ») 
= = Gin, t — 7) sin no(t — » | dr (16) 


se / h(r, t)[Ae*!’""! cos ev — 7)] dr, 
Oe ese 


where everything is known except /(z, ¢). 


N 


Fm,t—7 = Pe [nm COS mw(t — 7) 
+ Tim SIN mw(t — 7)] (17) 
Gia =e y [Bm Sin mw(t — 7) 
+ Ymn COS mu(t — 7)] 
P(n, t) = ie g(r, )F(n, t — 1) dr 
: (18) 


Q(n, t) = il g(r, OG(n, t — 7) dr. 


Outline of Solution 


An exact mathematical solution of this complicated 
integral equation is obtained by setting up appropriate 
functions that will be amenable to Laplace transform 
treatment, and by creating systems of simultaneous time 
varying equations. In the Laplace transform analysis, 
all boundary conditions are ignored until the end when 
compensating delta functions are introduced. In order to 
solve for unknown time variable coefficients, sets of 
simultaneous time varying equations are derived which 
make the final result self-checking. 

The main lines of the solution are as follows. 

Working from the integral equation (16), let 


Teoh i Mane" mene = ah 
0 


(19) 
>> u(t) cos nw(t — v) + SS y,(t) sin nw(t — y) 


74 IRE TRANSACTIONS ON 


where 


u(t) 


P(n, t) — [ h(r, )F(n, t — 7) dr 
J0 (20) 


y(t) = Qin, t) — / h(r, HG(n, t — 7) dr. 


I 


The absolute value sign-in the first expression for 
I(v, t) is removed by breaking up into two integrals. An 
integration by parts is performed and variables changed 
so as to generate convolution product integrals. Eq. (19) 
is then solved for h(z, ¢) in terms of y,,(¢) and ?, (4), ignoring 
the fact that u,(¢) and »y,(¢) are themselves functions of 
h(r, t)! In functional form, after a number of such opera- 
tions the result is 


N N 
h(t, t) = >> M(t) cosnwr + >> R,(t) sin nor + She” 


+ Qe" + Uli) d(7) + VO SE — 7), 
=e (c at ihe oes 


where 6(7) and 6(f — 7) are delta functions and M,(é); 
RL, SO, QO, UW, and V(@) are 2N + 4 unknown 
time varying coefficients still to be determined. These 
2N + 4 time varying coefficients are now found by 
substituting the functional form of h(7, ¢), as shown in 
(21), into the expressions for u,(¢t) and »,(t), given by 
(20), and equating both sides of (19). This gives a system 
of 2N + 4 simultaneous time varying equations which 
can be implicitly solved by following a special order. A 
more complete discussion of details appears in the 
Appendix. 

Using the optimum weighting function (21), the mini- 
mum mean square ensemble system error, obtained from 
formula (9), is 


(21) 


(EW) = LPO, O [tole 0 — Me, 0) 


cos nw(t —- 7) dr 


(22) 
+ 10,9 flor, 0 — WG, 9] 


sin nw(t — 7) dr. 


Section III. Two AppLicaTIONS 


The incoming signal, considered to be a member of a 
random process {’s(¢)}, is of the form 


's(t) = ’a coswt + ‘bsin wt, ja) (23) 
= 0, b= 0 
where w = constant, 
j 2 Sys ee 
( a Vie Over 7, oo ( b ve over 7 a (24) 


idl) Fe over j — 0 


so that the autocorrelation function y,,(f,, f2) becomes 


INFORMATION THEORY 


vas to) 


(a rees NC) 
es We Pt et) 


This signal is a special case (where VN = 1) of the more 
general signals treated in the previous section. } 

The noise random process {’n(¢)} is assumed to have 
the same damped exponential-cosine autocorrelation 
function as before, (13). It is also again supposed that the 
various cross-correlation functions are zero. 

There are two objectives in mind. In Case 1, the desired 
output is to recover the input signal, and in Case 2, it is 
required to approximate the integral of the incoming 
signal. Thus, to be specific, if the incoming message is 4 
distorted velocity signal, Case 1 has the task of finding 
the optimum weighting function for a filter recovering 
the velocity while Case 2 determines the optimum filter 
to know the position. 


a cos w(t; — ty), 


0, 


Case 1 
‘d(t) = ‘s(é) forall 7. 


Therefore, 


Vaal, to) = Vealti, te) a Nase te). 


optimum weighting function h(7, ¢) from (16), is 
t 

a COS wy = if h(r, Hla cos ww — 7) 
0 


+ Ae*'’-"! cos e(v — 7] dr OR Se 


Omitting all details, the solution of (28) may be found 
from the previous analysis (with N = 1 and statistical 
terms as indicated above). The solution is 


h(r, t) = M(t) coswr + R() sinwr + S()e* 
+ Qe" + UMAr) + VHSE — 7), - 
a= (ce de ke yeas (29) 


where the time varying coefficients M(t), R(t), S(), U(h), 
and V(¢) are determined through a known system of sis 
simultaneous time varying equations. On solving these 
simultaneous equations, one can demonstrate that fol 
large 7, 


M(t) ~ 2(t +d)’ as t— © 
R(t) = o(M() as 1 © 
S(t) ~ pM as tr @ (30 
Qt) = o¢@*) as t— © | 
U(t) ~ p, M(t) as t— © 
V(t) ~ (p2 sin wt + p; coswt)M(t) as too 


where the notation f(f) ~ g(f) read “f(£) asymptotic t 
g(t)”, ast —> , means that f(d)/g(f) > last > @, whil 
f(@) = o(g(t)) ast © means that f(t)/9(t) > 0ast— o| 
The parameters A, p, p;, p2 and p; denote certain constants 


»t expressions for M(¢) and R(é) for all times ¢ may be 
ined if desired, and through them exact expressions 
che other time varying coefficients. Unfortunately, 
-are lengthy expressions involving combinations of 
nometric and exponential functions which are much 
cumbersome to be useful. The precise nature of 
» and ps is not significant. However, \ and p have the 
Wing meaning in terms of known information, 


\ = 4Ak@ + a’)(ad)" (31) 
p = 2a(a — k)\(w — a’)d* (32) 
d= |@—ce? + \[w+o’ +k] (33) 
Gh 1 (ae a a a (34) 


18 > Oif Ak > 0, and p = Oifa — k = 0. The 
tant a — k = 0 if c = 0, namely, when the noise 
icorrelation function is a damped exponential alone. 
‘om (22), the minimum mean square ensemble 
bm error for arbitrary values of t, when using the 
mum weighting function (29) is given by 
(e())av = 2AK(w + a&)d*M(b). (35) 


error approaches zero for large ¢ since M(¢) has this 


2 

val j, 

= [ ‘sd 

0 

| ie lee es intl is SS 

| Y sinwt + ’bw (1 cos wt), ew (36) 
0; meen) 


new integral equation which one needs to solve for the 
mum weighting function A(r, f) is 


[sin wy + sin w(t — v)] = a | h(r, t) cos wv — 7) dr 


| + A / h(r, Be *'’~"! cos cv — 7) dr, 
0 


ORS (37) 
Niore: the solution has the form 
'#) = M(t) coswr + R(é) sin wr + Sle?’ 
+ Qe" + Uli) s(7) + V()aé — 7), 
(pane Sey re (38) 


ire the time varying coefficients S(), Q(), U(t) and 

bear similar relationships to M(¢) and R(t) as in 
’ 1. It is mainly in the final determination of M(¢), 
| and S(t) that Cases 1 and 2 differ, for now, 


i 


(t) ~ 2[o(t + d)]° sin wt as t—> © 
P(t) ~ 2Qlo(t + »)]7[1 — coswi] as i> (39) 
S(t) ~ pM(t) + oR) AS Wyse Cp 


Bendat: Synthesis of Optimum Time Variable Linear Filters 75 


The parameters \ and p are the same as previously, 
(31) and (32), while the constant o is given by 


¢ = 4wak(a — k)d™. (40) 


Note that ¢ = Oif a — k = 0, that is, if c = 0. If desired, 
exact lengthy expressions may be derived for the time 
varying coefficients at arbitrary operating times ¢. 

The minimum mean seuare ensemble system error 
possible in Case 2, for any value of #, is given by 


CVe'b))ay = 2AkK@ + a’((wd)* 


-{M(é) sin wt + RU — cos wt) } (41) 


As in Case 1, this error approaches zero for large ¢. 


SEecTION IV. SYNTHESIS OF SEPARABLE TIME 
VARIABLE WEIGHTING FUNCTIONS 


From the nature of the solutions derived in the first 
part of the paper, (21), a new category of separable time 
variable weighting functions will be defined and in- 
vestigated. 


Definition 1 


A time variable weighting function h(z, f) is said to be 
separable if 


hr, t) oe fg) 


where f(t), the time varying factor, is a function of ¢ 
alone, while g(r), the constant parameter factor, is a 
function of 7 alone. 

The decomposition possible for separable time variable 
weighting functions is essential to the further discussion. 
In particular, more complicated separable weighting func- 
tions may be involved of the form, 


(42) 


lee, Oy Zs frltgn(r) + U(r) + V(t)a(t — 7) (48) 


where M = integer, 6(r) and 6(¢ — r) are delta functions 
at r = 0 and r = J, respectively. 

The response r(é) to an input 7(¢) for a system character- 
ized by a separable weighting function (43) is given by 


= Ltd f gulilé — 9 de + UOMO + VOM), 
(44) 


This follows directly from the definition of h(r, ¢) and 
the superposition property of linear systems. Note that 
the effect of the delta function terms is to merely pick 
out the final and initial values of the input for special 
attention. These are multiplied by time varying gains 
U(t) and V(t). The factors g,(7),n = 1, 2, ---, M represent 
M constant parameter linear weighting functions, and 
the corresponding M time varying coefficients f,(¢), 
n = 1, 2, ---, M denote time varying gain amplifiers. 

A schematic diagram to generate h(7, ¢) for arbitrary 
operating times is displayed in Fig. 1. Switch S is initially 
closed at ¢ = O and open for all ¢ > 0. 


76 IRE TRANSACTIONS ON INFORMATION THEORY 


Fig. 1—Exact synthesis of separable h(z, 1). 


Fig. 1 indicates how to synthesize any separable time 
variable weighting function of class (48). If one now 
examines important features of the optimum time variable 
weighting functions determined in (21) for a general 
case, M = 2N + 2 and the constant parameter factors 
gri(r), n = 1, 2, ---, 2N + 2, are of the following definite 
types: 


Ge ee NWT , 


aT at 


Kenia, ey) a 


ry eee 
a (C ale ise 


The optimum time variable linear weighting function is 
given by 


sin nNwT, n (45 
40) 


N N 


hor, t) = >> M(t) cosnwr + >> R, (2) sin nwr + S(te*” 


+ Qe" + Ul) Xr) + Vsti — 7) 


where the 2N + 4 time varying coefficients 17,(4), R,,(t), 
S(t), Q(), U@) and V(¢) are found from a system of 2N + 4 
simultaneous time varying equations (details in Appendix). 
Simple mechanizations for the constant parameter 
weighting functions of (45) are well known. 

In the two examples of Section III, the optimum time 
variable weighting function has the form corresponding 
to N = 1, 


hr, t) = M(t) coswr + R(t) sin wr + S(te” 
+ QWe" + UM 67) + VOSE — 7). (47) 


The six time varying coefficients M(t), R(), S(O, QO, 
U(t), and V(t) are determined from a system of six 
simultaneous time varying equations. Consequently, if 
these six quantities are mechanizable, an exact synthesis 
for arbitrary times of operation would be given by the 
schematic design of Fig. 2. 


(46) 


Fig. 2—Exact synthesis for NV = 1 examples. 


Figs. 1 and 2 show useful direct ways for handlir 
many synthesis problems. Fig. 1 illustrates how on 
mechanizes a general separable time variable weightir, 
function (43), while Fig. 2 gives a design appropriate | 
the optimum time variable linear filters derived in Sectic 
III for the special cases when N = 1. Optimum filters fi 
arbitrary N > 2 are obtained by inserting additional e 
nwt and sin nwr components, n = 2, 3, ---, N, multiple 
by required time varying gains /,(¢) and R,(¢). | 


Definition 2 


For long-time (short-time) operation, h,(r, f) is 2 
asymptotic weighting function to an optimum weightir 
function ho(z, ¢) if ; 


h(r, t) hor, 2) > 1 ass te > (Or (4 


This is written f 


h(r, t) ~ hos, t) as t—> ~ (t 0). 


An immediate consequence of this definition is that fi 
separable h,(z, f), say, ho(z, t) = fo(é) go(r), then h,(r, t) ! 
fall) go(t) ~ ho(z, t) ast > © (f > 0) according to wheth! 
or not f(t) ~ fo(t) ast—> » (t > 0). Also, for the extende 
class of separable time variable weighting functior 
covered by (43), asymptotic forms can be derived } 
merely finding suitable asymptotic forms to each of tl] 
time varying coefficients. This is a particularly convenie) 
method to follow for optimum weighting functions (4 
where the time varying coefficients are specified throug 
systems of simultaneous equations. 


Definition 3 


The notation ro(¢) and r,(¢) will be used to denote tl 
response of an optimum system h(z, ¢) and its asymptot 


em f,(7, ¢), respectively, to an arbitrary input 7(é). 


a) = / KO er (49) 
pa 3 is hr, Hitt — +r) dr. (50) 


he fact that ha(z, ¢)/ho(7, t) — 1 necessarily only as 
- © (f—> 0), plus the observation that | ro(¢) | is bounded 
4a bounded input proves | r.(f) — ro(t) | 0 as t— 
+ 0). This leads to 


porem 8 


Mlechanization of an asymptotic weighting function 
Hes a synthesis which is valid in the corresponding 
imptotic region of operation. Nothing can be inferred 
general about its performance during the non- 
imptotic period. Proof: See Appendix. 

synthesis for an asymptotic separable weighting 
iction, satisfying (48), will have a schematic design 
ilar to Fig. 1 or Fig. 2. 


icTION V. EXAMPLES oF ASymPToTIC System DESIGN 


Jalid mechanizations for long-time operation will now 
‘derived for the optimum time variable linear filter 
ghting functions found in Cases 1 and 2. Exact synthesis 
all times of operation does not appear to be a feasible 
sneering problem. 


_-- 


As t—> o, one sees from (29) and (30) that the asymp- 
lic weighting function takes the form 


r,t) = 2(t + dA)" [coswr + pe” + p,6(7) 


+ (p.sin wt + p; cos wt)d(t — 7)). (51) 
rause of the time factor (¢ + \)~*, the effects of the 
D delta functions (z.e., merely picking out the initial 
i final values of the input) can be ignored in long-time 
rration. Hence, one would synthesize only the nondelta 


ction terms, 


hr, 2) = 2(¢ + r) [cos wr + pe *’]. 


(52) 


| appropriate design for the asymptotic weighting 
ction (52) is drawn in Fig. 3. 

The system error e(t) associated with use of the above 
uit will now be calculated. From formula (2), together 
h (14), and (52), omitting the ensemble superscript J, 


] 
| 


2(t + r)* / [cos wr + pe “"] 


-[s(¢ — r) + n(t — 7)] dz — s(t). (53) 


lis is integrated easily and shows for large ¢ that 
) — e,(¢) where 


t 


) = 2 +)” [ [coswr + pe" |n(t — 7) dr. 


(54) 


{7 Bendat: Synthesis of Optimum Time Variable Linear Filters 


es 


Fig. 3—Case 1 filter for long-time operation. 


The quantity e,(¢) represents the system error during the 
desired asymptotic long-time region of operation. Note 
the two properties: 


1) e,(¢) is a funetion of n(f) alone such that e,(¢) — 0 if 
n(t) > 0. 

2) e,(¢) and its first derivative é,(¢) are bounded in 
absolute value and, in fact, decrease with time under 
the hypothesis that n(¢) is bounded. 


Case 2 


In Case 2, from (38) and (39), the optimum time 
variable linear filter weighting function for large ¢ has an 
asymptotic form (exclusive of impulse terms) given by 


hr, 2) = 2[w(t + d)]""{sin wt[cos wr + pe] 


+ (1 — cos wt) [sin wr + ce “"]}. (55) 


Again, the impulse functions give a negligible contribution 
for long-time operation, and are therefore ignored. A 
schematic design for this asymptotic weighting function 
is shown in Fig. 4. 

From (2), (86), and (55), the system error e(¢) corres- 
ponding to the Case 2 circuit is determined. For large f, 
this system error e(t) — e,(t) where 

el) = | hilr, On(t — 7) ar. (56) 

0 

Thus, the same two properties of é,(¢) relative to n(¢) are 
seen to hold for the Case 2 circuit as the Case 1 circuit. 
In addition, because of the factors sin wf and (1 — cos at) 
in h,(r, ¢) of Case 2 [see. (55)], it is clear that the 
asymptotic system error e,(¢) [see (56)] will be zero at 
regular periodic intervals of time ¢ = 2nrw ', (n = 0, 
I? oe: 


CONCLUSION 


A few words about the fundamental nature of this 
work. First of all, the one serious restriction on the in- 
coming signals is a possible lack of available statistical 
information. Improved computing machinery and data 
handling techniques are helping to fill this need. The class 
of random processes expressible in finite Fourier series 
ensemble form, to which the incoming signals and desired 
outputs belong, covers many technical problems. For 
example, the input signals form a Gaussian random process 


78 IRE TRANSACTIONS ON INFORMATION THEORY 


|l-cos wt 


Fig. 4—Case 2 filter for long-time operation. 


it the statistics are such thati<a-)s, ocen7 =< 0.) aweves a: 
independent of n, while all other double moments are zero, 
and if the distribution of the random variables {’a,} and 
{’b,} is a normal distribution for all n. This case has been 
frequently studied. Secondly, it is an empirical fact that 
the noise perturbation treated, one whose autocorrelation 
function combines periodicity with exponential decay, 
occurs in many diverse places. Frequently, the auto- 
correlation function may be fitted to a damped exponential 
term alone. Thirdly, it should be emphasized that exact 
expressions for finite operating periods from 0 to any time 
t have been derived. Long-time operating period situations 
are, however, also included by letting ¢ approach infinity. 
Thus, a solution has been given not for one special problem 
but for many important physical applications. 

The two-step technique which was employed to solve 
the general integral equation appearing in this paper may 
be appropriate for other similar integral equations. In 
brief, this procedure requires that one should determine 
the functional form of the optimum weighting function 
except for unknown time varying coefficients, and then 
substitute this functional form back into the original 
integral equation so as to yield sets of simultaneous time 
varying equations whose solution gives the desired co- 
efficients. A derivation so obtained is automatically 
self-checking. It is not required that the first step be 
carried out. through use of Laplace transform theory, as 
was done in this work. Instead, one is free to use whatever 
insight he might have in a particular situation to propose 
a possible functional form, and proceed with the second 
step of satisfying the integral equation. Additional terms 
would be added, or substracted, depending on the derived 
results. 

In particular, this two-step technique should be used 
whenever the noise autocorrelation function is a linear 
combination of exponential or exponential-cosine auto- 
correlation functions, instead of a single such term as is 
considered in the paper. Since it is well known that many 
empirical autocorrelation functions may be fitted to such 


Mare 


a linear combination of basic autocorrelation functions 
without difficulty, the analysis is thus able to cover these 
more general situations. 

The class of separable time variable weighting functions, 
whose concept is abstracted from the form of the integra 
equation solutions, is deemed by the author to be an 
important physical entity. The synthesis techniques 
discussed herein refer to this extended class of separable 
filters and are not limited to the optimum filters alone 
Some open problems remain in determining under whai 
restrictive conditions the optimal weighting function ig 
of this separable form. Here, it has been demonstrated to: 
occur for a certain large class of optimization problems, 
but the question of its presence elsewhere is not known, 

The asymptotic synthesis ideas offer a fresh insight to 
the mechanization problem when exact synthesis ig 
dificult and when asymptotic operating regions are 
involved. A resulting design is necessarily valid only im 
the corresponding asymptotic region of operation. Conse: 
quently, if the apparatus is also to be used in nom 
asymptotic regions, extensive tests and engineering 
adjustments would be required before final production 
Some tests, of course, should always be made to compare 
performance of a theoretically proposed system with 


possibly, simpler circuits motivated by engineering 
experience. 
APPENDIX 
Proof of Theorems 1 and 2 
Let 


Bt) = Ce’) av 


ae 4 id 4 
= / / hy, Ohir Dad 2 one 
0 (6) ‘ 


Kis 
= i hig; Oye (ae eee 

0 
Assume h(7, ¢) is the optimum weighting function. Let 
N°*(é) be the mean square ensemble error using another 


operator h(r, t) + ng(r, t) where 7 is an arbitrary real 
constant. 


N%(t) = i, [ h@, ) + ng, N][ACr, ) + ag(r, 2] 
‘yii(t —v,t — 7) drdp 
= ip (hiv, t) derpaten 


vialt Saal t) dy ar yaalt, t) 


= Bi) — 2M) + FLO (59) 
where 7 
Mo = He gy, of alt =) 

= ie A(r, tyi(t — v, t — 7) in| dy (60) 


T , 
[ a, Dale, Dye =», t= 9) de 
0 


(| i ee an] 0 


| e result of Theorem 1 follows from showing that 
}) is a minimum if and only if M(é) [see (60)] is identi- 
i zero for any choice of the operator g(v, t),0 <»v < T. 
| M() = 0 for any g(», t),0 < » < T, then N’(é) = 
1) + L(t) > E’(t) for any 7, since L(t) > 0. There- 
, E’(t) is a minimum. This proves the sufficiency of 
orem 1. 

ionversely, if M(t) # 0 for all g(v, t),0 < » < T, then 
can make M(t) > 0 by changing the sign of a particular 
4) if required. Now, 


N*(t) = E*(t) — 2n[ M(t) — (n/2)L(0] 


(0) 
(61) 


(62) 


1 sufficiently small and positive, M(t) — (n/2) 
¥) > 0. Then N?(¢) < E°(t), giving a contradiction to 
original claim that H’(¢) was a minimum value. This 
wes the necessity of Theorem 1. 

‘he statement of Theorem 2 results by substituting 
_ Integral equation requirement of Theorem 1 into 
original expression (57) for E(t). 


ectronal Form of h(z, t) 


itarting from (19) and (20), one can demonstrate after 
bral steps that 


i 


i= 7, t) = Ae” / h(y, be" ” ~cosc(t — x — y) dy 
| 0 
E) t-— 
+ 2Ak i ee) / Aas 
0 0 (63) 
e *"-?™ cog e(t — x — y) dy | do 
N N 
= >> u,(t) cosnwr + >> y,(t) sinner. (64) 
ne 
L(t —v, t) = | hy, \e*°-” cose@ — y) dy 
F (65) 


ii if aly, Her sin ev — y) dy. 
0 


yet A(s, 4) be the Laplace transform of X(z, ¢) with 
pect to z, ¢ being held constant; similarly, let 
'— p, t) be the Laplace transform of A(t — », ¢) with 
pect to v, ¢ being held constant. Upon taking the 
blace transform of (63) and (64) with respect to a, 
br substituting (65), and using the relation A), (0, t) = 
{ u(t) which results from (63) when x = 0, one obtains 


k[(s + als, t) + als, 6] 


= LO + » [sF,(t) + noG,(d)][s° + no’) (66) 


Bendat: Synthesis of Optimum Time Variable Linear Filters 


72 


where the forms of L(¢), F,,(¢) and G,,(¢) are not important. 
The Laplace transforms of (65) with respect to v, ¢ being 
held constant, are 


Mb — p; tf) =", Olp ee hee | 
Alt — p,) = A(p, Ylell@ +k) + ¢7]7 


(67) 


where H(p, t) represents the Laplace transform of the 
unknown weighting function h(v, t) with respect to », t 
being held constant. 

In order to solve for H(p, ¢) from (67), it is necessary 
to change the Laplace transforms X(s, ¢) of (66) into 
expressions of type \(¢ — p, t). Take the inverse Laplace 
transform of (66) with respect to s, ignoring all boundary 
conditions. Then replace x in A(z, t) by t — v and take 
new Laplace transforms with respect to v. This yields the 
desired relation, and gives for H(p, t), 


N 


H(p, ) = Do (pM) + noR,() ip? + n'a" T* 


1 


+ (p + a) “S() + p— a) QM) —(68) 


where 


CC (69) 
and M(t), R,(¢), S() and Q(¢) are unknown time varying 
coefficients. 

Finally, the inverse Laplace transform of (68), together 
with additional impulse factors U(¢) 6(¢) and V(é) 6(¢ — 7) 
(which are introduced to account for previously neglected 
boundary conditions) shows that the optimum time 
variable linear weighting function h(r, ¢) has the functional 
form, 


hr, = > M(t) cosnwr + >> R,(t) sin nwr + S(He 


+ QWe™ + UMD6(7) + V@)st — 7), 

Ce Comte) 5) (740) 
where, for ease in later computation, 6(7) will be defined 
as the usual (Dirac) delta function multiplied by two, 
that is 


d(7) = 0 forany 7 #0 
(71) 


t 0 
[ (1) dr = [ S(s) dr = 1 forany #0. 
0 —t 


Altogether, there are 2N + 4 time varying coefficients 
which must be determined in (70). 


Determination of TimeVarying Coefficients 
For h(z, ¢) as given by (70), let 


Jv, } = iE h(r, te*'’""' cos c(v — 7) dr. (72) 


Then, clearly, h(r, ¢) will be the optimum weighting 
function if 


Iv, t) = AJ, i) (73) 


80 IRE TRANSACTIONS ON INFORMATION THEORY 


since the optimum weighting function must satisfy (19) 
and (20). 

By equating corresponding terms on both sides of (73), 
the 2V + 4 unknown time varying coefficients of (70) 
may be shown to satisfy the following system of 2N + 4 
simultaneous equations, whose consistency is guaranteed 
on the basis of physical validity of the processes under 
investigation. 


2AkA,M,(t) = d,[u,(f) cos nwt + v,(#) sin nwt] 


(awe? . eee) 


(74) 
2AkA,R,(t) = d,[u,(t) sin nwt — v,(t) cos nwt) 
(i San, 
N N ‘ 
UH = -(S wa + THR 
+ bS(0 + 1,00) 
| (75) 
Vi) = Zhp be MG) A be tee) 
+ be SO + QO) 
3M) + Y WORM + bS() + BQ) = 0 
1 1 (76) 


N N 
> 0° M0) + > OP RO + be SO + bse"Q@) = 0 


where 
Ai DioF-o 2 Deere 2 
Av he 750; a ce +k, (77) 
d, = M2 — 4n’w'e’ 
db.” = —KkA,, d,b,” = nw(A, — 2c’) sin nat 
— kA, cos nwt (78) 
d,b.? = c(A, — 2n’w’), db,” = 2nack sin nat 
+ ¢(2n’w — A,) cos nwt 
d,b,” = nw(A, — 2c’), d,b,” = —kA, sin nwt 
+ nw(2c — A,) cos nwt (79) 
d,b,° = —2nack, d,b, = c(2n’w — A,) sin nwt 
— 2nuck cos nwt 
while 
ee =i ee =i 
Ge =a ae b, (2a) | (80) 
b. = c[2a(a — hk], b, = —c[2a(a + k)]’ 
nip x. -1 ane =1 
bgp (20) ee b, (2a) (81) 
bs = c[2a(a + kJ," bs = —c[2a(a — k)]™. 


Marek 


Assuming /,(¢) and F,,(¢) remain finite as ¢ approaches 
infinity, it follows that S(f) is finite as ¢ approaches 
infinity while Q(¢) approaches zero. The 2N + 4 time 
varying coefficients are determined by the following 
procedure: First, S(4) and Q(é) are found in terms of 
M,,(t) and R,,(t) by (76); then, U(t) and V(é) are known as 
functions of 17,(¢) and R&,(é) by (75). Finally, M,(¢) ane 
R(t) are calculated from the 2N simultaneous (74). 


Proof of Theorem 3 


Suppose ho(r, ¢) is of a separable form, for example, 


holt, ) = foltgo(r) with har, ) = falt)go(t) ~ holt, t 
as t— o (t— 0). (82 

Then, f.(4) ~ fo(t) ast — ~ (¢{— 0). : 
In practical cases of interest, a bounded input will pro; 
duce a bounded output. Hence, there exists some constant 
K > 0 such that 


<1 


| ro(t) | = | fo(t) ih go(r)u(t — 1) dr 


{or alls 72. Oy (83) 


The difference, 


| rat). = told) | SS bys Ce eee dO 

as t— © (t—-0). (84) 
This convergence need not take place outside the asymp- 
totic region. Similarly, if 


M 3 
me 


Gag 2d fro) gn(r) + Uo(t) d(7) + Volt) s(t — 7) (88 


and 


5) 


h(t, ) = oy Fuad ga(1) + Us AC) + Vat) = 2) (86) 


where 
Jnalt) ~ foot) as t> © (t>0), n=1,2,--- , Mm 
U(t)~ Ut) as t— @ (t—0), (87) 


Vt) ~ Volt) as t— © (tO), 


then r,(¢) — ro(¢) necessarily only ast > © (t > 0). 


ACKNOWLEDGMENT 


This material was first studied at Northrop Aircraf 
Inc., Hawthorne, Calif. In its initial formulation, 
received limited circulation in a company report.’ The 
author is grateful to his former colleagues at Northrop 
for their helpful criticisms. The present paper represents 
both a revision and extension of the earlier effort. The 
author wishes to thank the Ramo-Wooldridge Corpora- 
tion, Los Angeles, Calif., for providing a stimulating 
atmosphere and facilities with which to complete the work: 


paitig, 


7J. S. Bendat, “Optimum Time-Variable Filtering for Non- 
stationary Random Processes,’’ Northrop Aircraft, Inc., Rep. No 
NAI-54-771; December, 1954. 


IRE TRANSACTIONS ON INFORMATION THEORY 81 


mtributors. 


k Ackerlind (A ’38—VA ’39—SM ’52) 
orn on July 9, 1910, in New York, 
! He-received the B.H.E. degree from 
the Polytechnic In- 
stitute of Brooklyn 
in 1932. In 1934, he 
received the M.S. 
in H.E. degree from 
Columbia —_ Univer- 
sity. From 1934 to 
1937, he was a 
research fellow at 
the Polytechnic In- 
stitute of Brooklyn. 
After obtaining the 
doctorate in elec- 
trical engineering 
Polytechnic Institute of Brooklyn, 
pined Hazeltine Electronics, Little 
| N.Y., as an engineer. In 1941, Dr. 
lind became subsection head at the 
Research Laboratory and was 
d in analytical and experimental 
igations of direction finders. 
pigined Northrop Aircraft in 1946 and 
Section head responsible for digital 
iter development. In 1949, he became 
| supervisor at the Jet Propulsion 
atory, California Institute of Tech- 
vy, and worked on analog computer 
pet Since 1953, Dr. Ackerlind 
been with the Radio Corporation of 
ip Los Angeles, Calif., and is now 
iger of Systems Engineering. 
‘is a member of Eta Kappa Nu and 
he Xi. 


ACKERLIND 


*, 
Oo 


jus S. Bendat was born October 26, 
lin Chicago, Ill. He received the A.B. 
f in mathematics and physics from 
the University of 
California in 1944, 
and the M.S. degree 
physics from the 
California Institute 
of Technology in 
1948. In 1953, he 
obtained the Ph.D. 
degree in math- 
ematics from the 
University of South- 
ern California. 

In October, 1955, 
Dr. Bendat joined 
aff of the Control Systems Division of 
/Ramo-Wooldridge Corporation, Los 
es, Calif., where he is now engaged on 
heed studies of optimum filter theory 
indom phenomena. Heis also a lecturer 
jthematics at the University of South- 
jalifornia, specializing in applied math- 
\cS courses. 
kt experience includes work as an 
limental physicist on the Manhattan 
Hct at the Radiation Laboratory, 
srsity of California, from 1942 to 1945; 
| officer in the U. S. Navy during 
1946; assistant professor of aero- 
cal engineering at the U.S.C. College 
eronautics in 1948-1949; research 


engineer with the Guided Missile Division 
of Northrop Aircraft Inc., from 1953 to 
1955. 

Dr. Bendat is a member of Sigma Xi, 
Phi Beta Kappa, Pi Mu Epsilon, the 
American Mathematical Society, the Math- 
ematical Association of America, and the 
Society of Industrial and Applied Math- 
ematics. 


7 
“e 


John L. Brown, Jr., was born in Ellen- 
ville, N.Y., on March 6, 1925. He received 
the B.S. degree in mathematics from Ohio 
University in 1948 
after serving in the 
U.S. Army for three 
years during World 
War II. From 1948 
to 1951, he held a 
fellowship in applied 
mathematics a t 
Brown University 
and received the 
Ph.D. in applied 
mathematics from 
that institution in 
1953. 

In 1951, Dr. Brown joined the staff of 
the Ordnance Research Laboratory, 
Pennsylvania State University, as a member 
of the Theoretical Studies Section. At 
present, he is an associate professor of 
engineering research, engaged in applied 
mathematical research related to the 
field of underwater acoustics. 

Dr. Brown is a member of the American 
Mathematical Society, Acoustical Society 
of America, Society for Industrial and 
Applied Mathematics, Phi Beta Kappa, 
and Sigma Xi. 


J. L. Brown, JR. 


o, 
Od 


For a photograph and biography of L. 
Lorne Campbell, see page 155 of the 
December, 1956 issue of IRE Transac- 
TIONS ON INFORMATION THEORY. 


+, 
Do 


Donald A. Darling was born in Los 
Angeles, Calif., on May 4, 1915. He received 
the B.A. degree from the University of 
California, Los 
Angeles, in 1939 and 
the Ph.D. degree 
from the California 
Institute of Tech- 
nology in 1947. 

Dr. Darling was 
a research director 
at the California 
Institute of Tech- 
nology from 1942- 
1945, a teaching 
fellow from 1944— 
1947, and a member 
of the research staff of the Naval Ordnance 
Testing Station, Inyokern, Calif., during 
1945-1946. 

He was a research associate, Cornell Uni- 
versity, Ithaca, N.Y., 1947-1948; assistant 
professor, Rutgers University, New Bruns- 


D. A. DaRuine 


wick, N.J., 1948-1949; instructor, Univer- 
sity of Michigan, Ann Arbor, Mich., 1949— 
1950, assistant professor, 1950-1955, and he 
is now associate professor of mathematics. 

He has served as visiting professor at 
Columbia University, 1952-1953, and the 
University of Chicago, 1955-1956. He has 
been a consultant for The RAND 
Corporation, Santa Monica, Calif., since 
1949; for the Operations Research Office, 
Washington, D.C., since 1952, and for the 
Engineering Research Institute of the 
University of Michigan since 1950. 

He is a member of the American Math- 
ematical Society, Mathematical Association 
of America, and Sigma Xi, and a fellow of 
the Institute of Mathematical Statistics. 


SZ 
Od 


J. Dugundji was born in New York, 
N.Y. on August 30, 1919. He received 
the B.A. degree from New York University 
in 1940, and in 1942 
began four years of 
service with the 
Army Air Force. In 
1948, he received the 
Ph.D. degree in 
mathematics from 
Massachusetts Insti- 
tute of Technology. 
He is presently an 


associate professor 
D of mathematics at 
J. Dueunps1 the University of 


Southern California, 
and since 1953, has been a mathematical 
consultant to the Radio Corporation of 
America in Los Angeles, Calif. 

Dr. Dugundji is a member of Phi Beta 
Kappa, Sigma Xi, and the American 
Mathematical Society. 


\7 
OG 


For a photograph and biography of Paul 
BE. Green, Jr., see page 97 of the June, 1956 
issue of IRE Transactions on INFOR- 
MATION THEORY. 


& 

A. Hauptschein (S ’47—A ’50—M_ ’55) 
was born on October 31, 1925, in New 
York, N.Y. He received the B.S. degree in 
electrical engineering 
from Pennsylvania 
State University in 
June, 1947. In June, 
1948 he received the 
M.S. degree in elec- 
trical engineering anP 
in February, 1957, 
the professional (H. 
E.) degree, both 
from Columbia Uni- 
versity. 

From 1948 to 1952, 
he was employed by 
Airborne Instruments Laboratory asa proj- 
ect engineer in the antenna and _ special 
devices section, and worked on the design 


A. HauprscHEeINn 


82 IRE TRANSACTIONS ON INFORMATION THEORY 


of communication, navigation, and homing 
antennas for high speed aircraft and heli- 
copters. 

Since 1952, Mr. Hauptschein has been 
associated with the research division of 
New York University, department of 
electrical engineering, in the capacity of 
engineering scientist and instructor. At 
New York University he has been con- 
cerned with the design of a microwave 
impedance measuring bridge and is presently 
engaged in an evaluation study for com- 
munication systems. 

Mr. Hauptschein is a member of Tau 
Beta Pi, Eta Kappa Nu, Pi Mu Epsilon, 
and Sigma Xi. 


F Saburo Muroga was born in Numazu, 
Japan, on March 15, 1925. He graduated 
from the electrical engineering department 
of the University of 
Tokyo in 1947. He 
was engaged in theo- 
retical research on 
pulse modulation and 
narrow-band voice 
transmission system 
in the Railway 
Technical Labora- 
tories from 1947 to 
1950 and in _ the 
Radio Regulatory 
Commission from 
1950 to 1951. 

In 1951, he joined the staff of the Elec- 
trical Communication Laboratories of the 
Nippon Telegraph and Telephone Public 
Corporation and has been engaged in 
research of the communication theory and 
also in construction of a parametronic 
digital computer. He studied at the research 
laboratory of electronics of the Massa- 
chusetts Institute of Technology in 1953 
and at the digital computer laboratory of 
University of Illinois in 1954. 

He is a member of the Institute of 
Electrical Communication Engineers of 
Japan and the Physical Society of Japan. 


S. Muroca 


+, 
Oa 


Leonard §S. Schwartz (S ’42—A ’45— 
SM ’47) was born on May 28, 1914, in 
Pittsburgh, Pa. He received the B.S. and 
M.S. degrees in phys- 
ics from the Univer- 
sity of Pittsburgh. 
While in the military 
service during World 
War II, Mr. Schwartz 
was engaged in ra- 
dar research and de- 
velopment at the Ra- 
diation Laboratory 
of M.I.T. and later at 
the Naval Research 
Laboratory. After the 
war he remained as a 
civilian at Naval Research Laboratory until 
1947, when he joined the Hazeltine Elec- 
tronics Corporation to work on radar devel- 
opments. In 1952, he joined the research 
division of the college of engineering of 
New York University where he has been 


L. 8. ScowartTz 


directing projects concerned with appli- 
cations of communication theory, 

Mr. Schwartz is a member of the Ameri- 
can Physical Society, the IRE, the AIKE, 
Sigma Pi Sigma, and Sigma Xi. 

+, 


SO 


Arnold J. F. Siegert was born in Dresden, 
Germany, in 1911. He received the Ph.D. 
degree 


at the University of Leipzig, 
Germany, in 1984. 

He worked as a 
Lorentz Foundation 
fellow in Leiden, 
Holland, from 1934— 
1936; as a teaching 
assistant in the 
Physics Department, 
Stanford University, 
Stanford, Calif., 
during 1936—1939; 
as a physicist for 
the Texas Company 
from 1939-1942, and 
the Stanolind Oil and Gas Company, 
Tulsa, Okla., during 1942, 

From 1942 to 1946, he was engaged in 
war research at the Radiation Laboratory, 
M.1.T., Cambridge, Mass., and from 1946 
to 1947 he was an associate professor in 
the Physics Department, Syracuse Uni- 
versity, Syracuse, N.Y. Since 1947, he has 
been professor of physics at Northwestern 
University, Evanston, Ill. 

During 1953-1954, Dr. Siegert held a 
Guggenheim Fellowship and worked at 
the Institute for Advanced Study, Prince- 
ton, N.J. He is a consultant for the Stano- 
lind Oil and Gas Company and The RAND 
Corporation, Santa Monica, Calif. 

He is a fellow of the American Physical 
Society and a member of the Society of 
Exploration Geophysicists and the Institute 
for Mathematical Statistics. 


A. J. F. SrrGERT 


George C. Sponsler was born December 
2, 1927, in Collingswood, N.J. He attended 
Princeton University and as an under- 
graduateheldanRCA 
scholarship. He was 
graduated in 1949 
with highest honors, 
receiving the B.S. de- 
gree in engineering. 
In 1951, he received 
the A.M. degree and 
in 1952, the Ph.D. de- 
gree, alsofrom Prince- 
ton University. He 
was elected to Phi 
Beta Kappa and as 
a graduate student 
was awarded the Sayr Fellowship and a 
G. E. Coffin National Fellowship in elec- 
tronics. 

Between terms, he was employed by the 
Carnegie Institution Department of Ter- 
restial Magnetism, the Johns Hopkins Uni- 
versity Applied Physics Laboratory, the 
RCA Laboratories, and the Brookhaven 
National Laboratory. During this period he 
devised and tested equipment for a cosmic 
ray telescope, mass spectrograph, and an 


G. C. SPonsLER 


Mar 


ionospheric recorder, and performed 
periments with secondary electrons. 
receiving his doctor’s degree, he w 
employed by the Massachusetts Institu 
of Technology, Lincoln Laboratory, 
four years, where he worked on the statisti 
of radar detection, electron optics, and f 
mathematics of systems analysis. — 
addition to various mathematical analys 
he devised an automatic electron trajecto 
tracer. 7 

At present, Dr. Sponsler is fulfilling 
two-year contract as liaison officer f 
electronics with the London branch of q 
Office of Naval Research. 


George L. Turin (M ’56) was born. 
New York, N.Y., on January 27, 1930. § 


course 
electrical é 
ing, with Phil 
Corp. as the @ 


In the summer — 
1952, he was @ 
Veta , 
Fellow at Marcon 
Wireless Telegraj 


G. L. Turin 


Co, in England. 

From 1952 to 1956, he worked at M.1.2 
Lincoln Laboratory in the field of statis 
communication theory, first as a sti 
member, and later as a research assista 
while completing his doctoral studi 
During this latter period, he was also 
consultant to the firm of Edgerton, Ge 
meshausen and Grier. He received t¢ 
Sc.D. degree in electrical engineering fro 
M.I.T. in 1956. 

Since July, 1956, Dr. Turin has be 
engaged in radar research studies at Hugh 
Aircraft Co. He also currently teach 
part-time at the University of Southe 
California, 

Dr. Turin is a member of Eta Kap 
Nu, Tau Beta Pi, and Sigma Xi. 


%, 
e 


2%, 
° 


Jean A. Ville was born in 1910 in M 
seilles, France. He holds the Ph.D. degr 
in mathematics and is also a graduate. 
law (1939). He w 
professor of n 
chanies at the U 
versities of Poitic 
and Lyon. 

In 1948, he enter 
the technical staff 
the Société Al 
cienne de _ Consti 
tions Mécaniques: 
Paris for studies 
electronics and pr¢ 
ability. 

Dr. Ville is noy 
member of the Board of Directors 
8.A.C.M., and teaches automatic comy 
tation at the Faculty of Sciences of Pai 


J. A. VILLE 


‘'LED IN STACKS 


INFORMATION FOR AUTHORS 


49 


Authors are requested to submit editorial correspondence or technical manu- 
scripts to the Publications Chairman for possible publication in the PGIT Trans- 
ACTIONS. Papers submitted should include a statement as to whether the material 
has been copyrighted, previously published, or accepted for publication elsewhere. 


Papers should be written concisely, keeping to a minimum all introductory 
and historical material. It is seldom necessary to reproduce in their entirety previ- 
ously published derivations, where a statement of results, with adequate references, 
will suffice. 


To expedite reviewing procedures, it is requested that authors submit the 
original and two legible copies of all written and illustrative material. The manu- 
script should be double-spaced, and the illustrations drawn in India ink on drawing 
paper or drafting cloth. Each paper should include a carefully written abstract of 
not more than 200 words. Upon acceptance, papers should be prepared for publica- 
tion in a manner similar to those intended for the ProcrEpiInes or THE IRE. 
Further instructions may be obtained from the Publications Chairman. Material 
not accepted for publication will be returned. 


IRE Transactions oN INFoRMATION THEORY is published four times a year, 
in March, June, September, and December. A minimum of one month must be 
allowed for review and correction of all accepted manuscripts. In addition, a period 
of approximately two months is required for the mechanical phases of publication 
and printing. Therefore, all manuscripts must be submitted three months prior 
to the respective publication dates. In addition, the IRE Convention ReEcorp 
is published in July, and a bound collection of Information Theory papers delivered 
at the annual IRE National Convention is mailed gratis to all PGIT members. 


All technical manuscripts and editorial correspondence should be addressed to 
Laurin G. Fischer, Federal Telecommunication Labs., 492 River Road, Nutley, 
N. J. Local Chapter activities and announcements, as well as other nontechnical 
news items, should be addressed to Nathan Marchand, Marchand Electronic Labs., 
Riversville Road, Greenwich, Conn. 


